kit icon indicating copy to clipboard operation
kit copied to clipboard

sitemap.xml

Open gavinr opened this issue 3 years ago • 31 comments

Is your feature request related to a problem? Please describe. Most websites need to provide a sitemap.xml for SEO purposes.

Describe the solution you'd like SvelteKit should provide a way (automatically, or recommended way via documentation) on how to create a sitemap.xml.

Describe alternatives you've considered I see sitemap.xml generation is mentioned in the sapper repo - is this the way to do it?

Also mentioned by @antony here.

How important is this feature to you? I think this will affect a large number of SvelteKit users who are building websites.

Note

The SvelteKit homepage says:

SvelteKit doesn't compromise on SEO

image

Without supporting sitemap.xml, this claim is inaccurate/disingenuous.

Thank you!

gavinr avatar Apr 20 '21 04:04 gavinr

You can create an endpoint with a get function to provide this, e.g. sitemap.xml.js or sitemap.xml.ts, but you will have to handle the content yourself.

But if you are meaning that you want SvelteKit to provide the pages it knows about, that may be something that could be provided, similar to how the files can be imported from $service-worker?

CaptainCodeman avatar Apr 20 '21 22:04 CaptainCodeman

you want SvelteKit to provide the pages it knows about

Yes, that seems along the lines that I'm thinking.

gavinr avatar Apr 20 '21 22:04 gavinr

I was playing with SvelteKit in November and wanted to do just that: generate a sitemap based on the pages. The source was still 'closed' at that time, so I used a direct dist. include:

import {create_manifest_data} from '@sveltejs/kit/dist/utils.js';
const manifest = create_manifest_data('src/routes');

Then you can use the manifest.pages, of course you want to cache the result or just use the adapter-static (export).

But it would be great if this would be exposed in some way. I also used the Svelte compiler to parse the pages to extract a svelte:head title, and some metadata comments, in this way you can auto-populate navigation.

bwbroersma avatar Apr 23 '21 08:04 bwbroersma

I also wanted to ask for that feature. Would be a great addition for SvelteKit to provide even more automations.

Zerotask avatar May 15 '21 14:05 Zerotask

It sure would be super nice if the sitemap would be generated from the page routes. But that only starts making real sense, if there was a way to define priority and changefreq on a per route basis plus lastmod and optional images per actual page.

moritzebeling avatar May 23 '21 11:05 moritzebeling

Here's a video tutorial showing how you can create your own sitemap.xml: https://www.youtube.com/watch?v=u8n5-urtGB0

benmccann avatar May 26 '21 20:05 benmccann

I also need to create a sitemap.xml and think SvelteKit would benefit from having it built-in (as it's such a fundamental task). At minimum it should be explained in the docs how to do it correctly in "the SvelteKit way".

To be honest I didn't like the video as it left me more confused than before: where is the new (?) api endpoint coming from? How else can I get the current list of pages? What's the deal with these endpoint URLs? How can I pre-generate the sitemap? ...Maybe it's all answered when watching the whole series but I'm not planning to do that.

Edit: Because we developers (at least I ;) ) tend to make things more complicated than they have to be. For now I ended up writing the sitemap.xml manually. (Took less than 10 mins.)

kvn-shn avatar Jun 02 '21 04:06 kvn-shn

The video assumes that you have an existing API endpoint (e.g. at your CMS) that knows the correct routing and urls of all pages within your frontend website. The problem is, that you would usually use SvelteKit to manage routing independently from your backend. So this approach works, but is a workaround.

A svelte-style solution in the future could be, that during build-time, every page generated is also optionally added to the sitemap. Maybe this could be configured with an additional option returned by the load() function of each route.

moritzebeling avatar Jun 02 '21 07:06 moritzebeling

I feel like we should provide a way to expose the routing table, but that's the extent to which we should provide any sort of sitemap support. Exposing the routing table is also useful for automatic documentation generators, for example.

antony avatar Jun 02 '21 11:06 antony

That's how I do it at the moment and then I build the sitemap with a generator script via a npm prebuild script. For static content that's way better than building it on-the-fly with an endpoint. SvelteKit could provide an optional setting config.kit.sitemap which is false by default and you can opt-in to do something similar.

Zerotask avatar Jun 02 '21 17:06 Zerotask

I agree with @antony SvelteKit should provide a way to expose the routing table.

But meanwhile... I needed sitemap for my static website (used with adapter-static) and I didn't want to write it manually :) So I made a simple proof-of-concept library for myself – but maybe it will come in handy for someone.

https://github.com/bartholomej/svelte-sitemap

Basically just install it and use it as prebuild script:

npm install svelte-sitemap --save-dev
{
  "name": "my-project",
  "scripts": {
    "prebuild": "svelte-sitemap --domain https://mydomain.com"
  }
}

And yes, I reckon that my library will soon become obsolete when the SvelteKit supports it natively ;) 👍

bartholomej avatar Jun 09 '21 16:06 bartholomej

The routing table alone wouldn't be enough because you also need to know all possible values of all parameters to generate a sitemap, the date each page was last updated, etc. These are things that SvelteKit fundamentally cannot provide in most cases. I'm curious how people suggest this would be handled or if there are other frameworks doing a good job at this

benmccann avatar Jun 09 '21 19:06 benmccann

@benmccann Yes, <lastmod />, this is something I am also trying to solve in my library. But it's just a naive solution... I only take the time of the last modification of each route file. I don't care about the child components modifications at all :( However, there is a --reset-time parameter with which I can set all routes to the current time each time I deploy.

Like I said, it's just a quick workaround library and a temporary solution. But for my purpose, that's ok for now :)

I'm also curious how people suggest this would be handled... 👍

bartholomej avatar Jun 09 '21 21:06 bartholomej

Nice tool @bartholomej! Would it make more sense to have something run as a post build so it could capture all the statically generated paths (including dynamic routes that are crawled by sveltekit)? Or am I misunderstanding how it works?

madeleineostoja avatar Jun 10 '21 00:06 madeleineostoja

Would it make more sense to have something run as a post build so it could capture all the statically generated paths (including dynamic routes that are crawled by sveltekit)?

Definitely, this should run as a postbuild by scanning the build folder. Currently @bartholomej 's package is scanning src/routes, which is fine until you start using generated pages like [slug].svelte. I opened an issue for that: https://github.com/bartholomej/svelte-sitemap/issues/1

The script from this article works great as a postbuild:

import fs from "fs";
import fg from "fast-glob";
import { create } from "xmlbuilder2";
import pkg from "./package.json";

const getUrl = (url) => {
	const trimmed = url.slice(6).replace("index.html", "");
	return `${pkg.url}/${trimmed}`;
};

async function createSitemap() {
	const sitemap = create({ version: "1.0" }).ele("urlset", {
		xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9"
	});

	const pages = await fg(["build/**/*.html"]);

	pages.forEach((page) => {
		const url = sitemap.ele("url");
		url.ele("loc").txt(getUrl(page));
		url.ele("changefreq").txt("weekly");
	});

	const xml = sitemap.end({ prettyPrint: true });

	fs.writeFileSync("build/sitemap.xml", xml);
}

createSitemap();

NEO97online avatar Jun 17 '21 15:06 NEO97online

Thank you @auderer @madeleineostoja It was just proof-of-concept and it worked well for my case. But you're right this should run as postbuild.

So now (in v1.0.0) it already works as a postbuild (including dynamic routes) 🎉

https://github.com/bartholomej/svelte-sitemap

npm install svelte-sitemap --save-dev

Let me know how it works ;)

bartholomej avatar Jun 29 '21 22:06 bartholomej

Although svelte-sitemap lacks many features, it's a great lightweight solution! However, I'm currently using sitemap.js due to its flexibility and open-source support and would recommend those who need complex sitemaps to do the same.

myisaak avatar Oct 04 '21 15:10 myisaak

There's also a somewhat similar use case with robots.txt when, based on config, you'd want to build a correct file. E.g. robots.txt.js:

export async function get({ host }) {
  return {
    headers: {
      'Content-Type': 'text/plain',
    },
    body: `User-agent: *
Allow: /
Sitemap: https://${host}/sitemap.xml`,
  };
}

It would be nice, if you could have the ability to prerender GET requests, this could solve both cases.

karolis-sh avatar Nov 07 '21 21:11 karolis-sh

Any work being done to solve this?

The statement highlighted by @gavinr is still live and couldn't be further from the truth.

I love SvelteKit, but it's not SEO friendly :)

Screen Shot 2022-02-04 at 3 39 44 PM

eba8 avatar Feb 04 '22 20:02 eba8

That's how I create a sitemap currently. If this endpoint is not being used in a link somewhere in the application it wont be prerendered. Maybe there was an option for adapter-static to do so regardless?

// sitemap.xml.ts
export async function get() {
  const response = await fetch('example.com/api')

  if (!response.ok) {
    return {
      status: response.status,
      body: response.statusText,
    }
  }

  const staticPages = Object.keys(import.meta.glob('/src/routes/**/!(_)*.svelte'))
    .filter(page => {
      const filters: Array<string> = ['slug]', '_', '/src/routes/index.svelte']

      return !filters.find(filter => page.includes(filter))
    })
    .map(page => {
      return page
        .replace('/src/routes', 'https://example.com')
        .replace('/index.svelte', '')
        .replace('.svelte', '')
    })

  const body = render(staticPages)

  const headers = {
    'Cache-Control': `max-age=0, s-max-age=${600}`,
    'Content-Type': 'application/xml',
  }

  return {
    body,
    headers,
  }
}

const render = (staticPages: Array<string>) => `<?xml version="1.0" encoding="UTF-8" ?>
<urlset
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
  xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
  xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
  xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
  xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"
  xmlns:pagemap="http://www.google.com/schemas/sitemap-pagemap/1.0"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
>
${staticPages.map(
  staticPage => `
  <url>
    <loc>${staticPage}</loc>
    <lastmod>${`${process.env.VITE_BUILD_TIME}`}</lastmod>
    <changefreq>monthly</changefreq>
  </url>
`,
)}
</urlset>
`

Myrmod avatar Feb 08 '22 09:02 Myrmod

I think there are a lot of people participating in this thread that don't understand that a SvelteKit-generated sitemap is likely to provide almost no value. Please read this whole comment before responding to it and don't leave comments that are just saying "this feature is important" without detailing a unique use case or imparting some technical information. I'm happy to consider things I may have overlooked, but +1 comments just add noise, slow down development, and will be hidden. You can thumbs up the issue to indicate support

The main thing that sitemaps do is help search engines figure out how to prioritize crawling very large sites. Search engines prioritize crawling based on a number of factors such as how important/popular a site is, how often the site is updated, etc. If you have a very large site where there are meaningful changes randomly distributed deep in the site's structure and it isn't popular enough that the whole thing will be frequently recrawled by Google you can hint to it that it should focus on the new and updated pages with a sitemap so that it can go directly there instead of crawling the first pages it encounters and then giving up to do the rest later. Sitemaps do not affect the ranking of your page relative to other pages other than ensuring search engines have the latest copy of a page.

For a sitemap to have an impact, you have to help the search engine prioritize which pages to crawl. SvelteKit cannot know which pages were recently updated and thus would have no possible way of generating a sitemap of any real value. It can't even know the list of pages in your site in most cases - except if you use adapter-static. This could potentially be a feature of adapter-static, but even then it would be so limited that it'd be practically useless. adapter-static works by crawling your site. If your site is large enough that it's difficult to crawl, then adapter-static will have as much trouble crawling your site as Google will have. But more importantly, it can't know which pages were recently updated, which defeats the entire purpose of a site map.

By definition, sitemaps must be generated in user-land and not by the framework to have any meaningful value. A framework like Wordpress can provide SEO plugins because it's responsible for the content database and has a well defined set of data it's storing. But a sitemap is way outside the realm of what a frontend framework can provide. You could possibly have things like a Strapi+SvelteKit plugin, etc. But for SvelteKit in a standalone context I haven't seen anything that's both possible and valuable for it to do

benmccann avatar Feb 16 '22 01:02 benmccann

@benmccann thank you for the thoughtful response and for your work on SvelteKit. I really appreciate what you've done and the time you put into it. Thank you.

I read your post and understand the technical and logical limitations of the ability of SvelteKit to generate a sitemap.xml. A few quick responses:

Paragraph 2:

The main thing that sitemaps do is help search engines figure out how to prioritize crawling very large sites

In paragraph 2 it seems like you're saying that sitemaps are not as essential as some people in this thread are implying. I think you're correct and would share the official Google Developers site about sitemaps. I think your comment echos these two points from the guide:

image

Contrary though,

  1. It says you may need a sitemap if your site is really large (bullet point 1). You have responded with a techincal reason of why SvelteKit might have trouble generating a sitemap.xml on really large sites, but that does not negate the fact that Google says you may need one.
  2. It says you may need a sitemap if your site is new and has few external links to it (bullet point 3). Many web sites built with SvelteKit fall into this category.
  3. It says you may need a sitemap if your site has a lot of rich media content (bullet point 4)

Given the above, you and I agree that there are many cases where sitemaps may not be necessary (small sites, comprehensively linked sites), but you must concede that certainly there are cases (numbers 1,2,3 above, for example) that websites built with SvelteKit do need a sitemap.xml. Do you agree?

Paragraph 4:

By definition, sitemaps must be generated in user-land and not by the framework to have any meaningful value ... a sitemap is way outside the realm of what a frontend framework can provide

I see your technical points here, and trust that you're correct. Given that, and my assertion (above) that sitemap.xml files are needed for some sites,

  1. If we're saying that SvelteKit will not support generating sitemap.xml files, I'm ok with that - I would just say that the phrase "SvelteKit doesn't compromise on SEO" should be removed from the homepage (see the note on my original post that started this issue), given that a site without a sitemap.xml is indeed a "compromise on SEO" for some situations (see paragraph 2 discussion above).
  2. How do other equivalent frameworks like NextJS and Gatsby handle this? I'm not very familiar with those frameworks but it looks like they "support" it via plugin? Maybe we can follow a pattern from there? Or maybe those are so different from SvelteKit that it's not relevant (if so, my apologies).

Finally, from a "developer relations" perspective, note that instead of trying to convince developers that sitemaps are unnecessary, those other frameworks have faced the fact that sitemaps are necessary for certain projects, and I think that's how we at SvelteKit should handle it. Even if it's not truly "built in" to SvelteKit, at least we can have a documentation page that explains the best way to do it (be it with a plugin, manually via a script in user world, etc etc).

Thanks again. I appreciate your time and all the work you do for SvelteKit.

gavinr avatar Feb 16 '22 01:02 gavinr

Thanks @gavinr for the thoughtful and constructive reply.

It says you may need a sitemap if your site is really large (bullet point 1). You have responded with a technical reason of why SvelteKit might have trouble generating a sitemap.xml on really large sites, but that does not negate the fact that Google says you may need one.

SvelteKit could certainly generate a sitemap.xml, but it would require the user to code it to be of any use. How would SvelteKit know what pages had new content in your database or API? I'm not saying you should never use a sitemap, but rather that it's fundamentally a problem that a frontend framework cannot solve for you because it doesn't have the necessary information to do so.

It says you may need a sitemap if your site is new and has few external links to it (bullet point 3). Many web sites built with SvelteKit fall into this category.

Yes, if there are pages that are not linked to either from your own website or another, then a sitemap will help Google discover those pages. Most sites built with SvelteKit will be fairly new, but typically the vast majority of pages on a site will be linked to from within the site. If a page has no links to it, it's not going to rank well, so there's very little value in having a search engine crawl it because it's highly unlikely to drive any amount of traffic.

It says you may need a sitemap if your site has a lot of rich media content (bullet point 4)

I'm assuming this is referring to Google's video extension of the sitemap standard. I'm less familiar with this than sitemaps in general, but I believe the reason Google requests it is because video embeds are often in iframes, which are difficult to crawlers to deal with. It's hard for me to see what exactly SvelteKit could offer in this situation. If there's some specific request, I'm happy to consider it. I don't have experience with video SEO, so I'm willing to be educated. My first reaction though is that this seems like something that fundamentally has to be dealt with in user land

If we're saying that SvelteKit will not support generating sitemap.xml files, I'm ok with that - I would just say that the phrase "SvelteKit doesn't compromise on SEO" should be removed from the homepage (see the https://github.com/sveltejs/kit/issues/1142#issue-862397393 on my original post that started this issue), given that a site without a sitemap.xml is indeed a "compromise on SEO" for some situations (see paragraph 2 discussion above).

SvelteKit provides a number of other features that are useful for SEO. E.g. it has been optimized to get really great core vital scores out-of-the-box. It's also does SSR by default and is maybe the only framework I've seen that can do dynamic rendering in just a few lines of code. These types of things are far more impactful, so I do think it's fair that we say we offer SEO benefits. In terms of the specific wording, I think that having a sitemap.xml that isn't aware of last modified times would be a compromise and would contradict the claim and having the user code it is necessary for uncompromising SEO. But anyway, we have at least one draft for a totally new homepage design and will do a homepage refresh in the future that will update this content. That will be a bit down the road though as right now we've chosen to put that on hold to finish the core features.

How do other equivalent frameworks like NextJS and Gatsby handle this?

NextJS appears to support it in exactly the same manner as SvelteKit currently does based on this link. Gatsby isn't too different and still requires quit a bit of user code based on their example. Personally, I'm not a big fan of their API. The NextJS approach seems a lot more straightforward to me. With Gatsby, I'd have to learn their API, which doesn't really do much for you. With NextJS I can use the sitemap spec and there's one less layer of indirection.

Even if it's not truly "built in" to SvelteKit, at least we can have a documentation page that explains the best way to do it (be it with a plugin, manually via a script in user world, etc etc).

Yes, totally agree with this. I'll take a stab at putting some starter docs up and people can add to them from there

benmccann avatar Feb 16 '22 03:02 benmccann

Hey @gavinr I've created sitemaps for Gatsby, NextJS and SvelteKit projects.

Gatsby does have a plugin for creating sitemaps you can check out the documentation over on the Gatsby repo. I haven't used this in a while now, from my understanding it will crawl the file structure once the site is built and generate a sitemap.xml file.

For NextJS and SvelteKit the approach is similar. Create a route/endpoint for the sitemap, here you can generate the xml needed this is useful if the data for the site is dynamically generated.

I documented how to make a sitemap with SvelteKit with help from @davidwparker's YouTube video.

For reference here's my notes on creating a sitemap in NextJS

Apologies for adding to the noise here @benmccann. The documentation from me can help until the starter docs have been added.

spences10 avatar Feb 16 '22 08:02 spences10

Regardless wether sitemaps are good or bad, there should be the possibility to have one – which is perfectly possible through the sitemap.xml.js endpoint as proposed above.

And sure, all the page info and metadata (date, priority) is received from the content structure (MD files, API, CMS). So as long as your content knows the URL under which it will be visible, the best strategy would be to from within your sitemap.xml.js retrieve a combined list of pages and metadata from your data source.

When that is not the case, and routing is only implied by SvelteKit, you could create a writable store and from every {endpoint}.json.js add one entry to that list of pages. The sitemap.xml.js endpoint would have to be rendered at the very end of the process, read the store and render the XML. But this strategy only works for every route that triggers some SvelteKit endpoint during prerendering.

moritzebeling avatar Feb 16 '22 09:02 moritzebeling

Not noise at all! Thanks for sharing @spences10!

Here's a draft of the SEO docs for folks who are interested: https://github.com/sveltejs/kit/pull/3946

Also, I saw Astro has a sitemap plugin, so we could look at what they do for inspiration: https://github.com/withastro/astro/tree/main/packages/integrations/sitemap

benmccann avatar Feb 16 '22 13:02 benmccann

Hey I'm just chiming in to see if there's been any progress on this since Feb. Thanks Ben for all your hard work and great discussions thus far.

justingolden21 avatar Apr 19 '22 08:04 justingolden21

Docs on how to generate a sitemap.xml were added in #3946: https://kit.svelte.dev/docs/seo#manual-setup-sitemaps (thank you @benmccann!)

In that PR, @Rich-Harris suggested we leave this issue open:

we could probably do something to e.g. make prerendered pages available to a sitemap generator. It wouldn't be trivial, and it wouldn't cover all cases, but there's room for Kit to provide value

gavinr avatar Apr 19 '22 13:04 gavinr

I have a tangentially related question regarding what would be safe to exclude from robots...

Specifically around if _app/chunks and _app/pages files can be ignored? Do .js files need to be indexed?

If anyone thinks this should be a separate issue I can log it.

I did ask on the Svelte Discord with no answers

spences10 avatar Apr 20 '22 09:04 spences10

A sitemap is really for end-user navigable routes, ie. the URLs that you want indexed, the content pages. You wouldn't list .js or .css files in it

CaptainCodeman avatar Apr 20 '22 13:04 CaptainCodeman