XML Sitemap Format

An XML sitemap is a file that follows a specific format, and provides information about each page on a website, such as the page's URL, update frequency, and importance relative to other pages. The format of an XML sitemap is as follows:

 

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/page1.html</loc>
      <lastmod>2021-12-31</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/page2.html</loc>
      <lastmod>2021-12-31</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.5</priority>
   </url>
</urlset>

 

Each <url> element in the XML sitemap corresponds to a single page on the website, and contains the following elements:

  • <loc>: the URL of the page, which should be a fully-qualified URL, including the scheme (e.g., http:// or https://).
  • <lastmod>: the date the page was last modified, in the format YYYY-MM-DD.
  • <changefreq>: the frequency with which the page is expected to change, which can be one of the following values: always, hourly, daily, weekly, monthly, yearly, never.
  • <priority>: a value between 0.0 and 1.0 that indicates the relative importance of the page compared to other pages on the website.

By providing this information to search engines, the XML sitemap helps search engines understand the structure and content of the website, and improves the chances of the website's pages being included in search results.

 

Understanding <loc>

The <loc> element in an XML sitemap represents the URL of a page on a website. It is used to identify the location of the page on the Internet and help search engines crawl the site more effectively.

The <loc> element should contain a fully-qualified URL, including the scheme (e.g., http:// or https://). For example:

<loc>http://www.example.com/page1.html</loc>

It is important that the URL specified in the <loc> element is accurate and up-to-date, as search engines use this information to crawl the website and index its pages. The <loc> element is required for each <url> element in an XML sitemap, and is typically the first element within the <url> element.

 

Understanding <lastmod>

The <lastmod> element in an XML sitemap represents the date when a page on a website was last modified. It provides information to search engines about the freshness of the content on the page.

The <lastmod> element should contain a date in the format YYYY-MM-DD. For example:

<lastmod>2021-12-31</lastmod>

Including the <lastmod> element in the XML sitemap is optional, but it can be useful for search engines to know when a page was last updated. This information can be used to determine how often the page should be crawled, and can also influence the ranking of the page in search results. If the <lastmod> element is not included, search engines may assume that the page has not been modified since it was last crawled.

 

Understanding <changefreq>

The <changefreq> tag is used in an XML sitemap to specify the frequency with which the page is likely to change. It is an optional tag that provides information to search engines to help them crawl a website more efficiently and keep their indexes up to date.

The values used for the <changefreq> tag indicate how often a page is likely to change, with options such as:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

For example, a page with frequently changing content such as a blog post would be labeled with a <changefreq> of "daily", while a page with infrequently changing content such as a company's "About Us" page would be labeled with a <changefreq> of "yearly".

 

Here's an example of the <changefreq> tag in an XML sitemap:

<url>
  <loc>https://www.example.com/blog</loc>
  <lastmod>2022-12-31</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.8</priority>
</url>

 

<url>
  <loc>https://www.example.com/about</loc>
  <lastmod>2022-12-31</lastmod>
  <changefreq>yearly</changefreq>
  <priority>0.5</priority>
</url>

In this example, the first URL with the path /blog is expected to change daily, while the second URL with the path /about is expected to change only once a year. The <lastmod> tag indicates the last modification date of the page, and the <priority> tag indicates the relative importance of the page, with a value of 0.8 for the blog and 0.5 for the about page.

 

Understanding <priority>

The <priority> tag is used in an XML sitemap to indicate the relative importance of a page compared to other pages on a website. It is also an optional tag and is used by search engines to help determine which pages to crawl more frequently and which to crawl less frequently.

The values used for the <priority> tag are decimal numbers between 0.0 and 1.0, where 1.0 represents the highest priority and 0.0 represents the lowest priority. For example:

 

<url>
  <loc>https://www.example.com/</loc>
  <lastmod>2022-12-31</lastmod>
  <changefreq>daily</changefreq>
  <priority>1.0</priority>
</url>
<url>
  <loc>https://www.example.com/blog</loc>
  <lastmod>2022-12-31</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.8</priority>
</url>
<url>
  <loc>https://www.example.com/about</loc>
  <lastmod>2022-12-31</lastmod>
  <changefreq>yearly</changefreq>
  <priority>0.5</priority>
</url>

 

In this example, the home page is considered the most important page with a priority of 1.0, followed by the blog page with a priority of 0.8, and finally the about page with a priority of 0.5.

It's important to note that search engines may or may not use the priority values indicated in a sitemap. It is just a suggestion and is not a guarantee that a page with a higher priority will rank higher in search engine results. The priority values should be used to indicate the relative importance of pages within a website and not as a ranking factor.