pwa-kit icon indicating copy to clipboard operation
pwa-kit copied to clipboard

[FEATURE] Add Sitemap handling

Open johnboxall opened this issue 1 year ago • 3 comments

Sitemaps are pretty cool! They tell crawlers what pages are available on a site. On large sites with complex catalogs, it helps where it might not be possible to discover all pages with plain navigation

B2C Commerce includes sitemap management features: https://help.salesforce.com/s/articleView?id=cc.b2c_sitemap_overview.htm&type=5

And while it isn't too difficult to integrate them into PWA Kit, it would be nice if they were included out of the box.

At a high level, to integrate sitemaps into PWA Kit, folks must:

  1. Configure a Hostname Alias for their site in Business Manager.
  2. Run the system job to create the sitemap files.
  3. Check that sitemaps are accessible. If you're using SFRA, must include a SiteMap controller. Once complete, sitemaps should be accessible at https://$HOST/s/$SITE/$SITEMAP.
  4. Setup a handler in ssr.js to serve the sitemap file from the storefront domain.

An example Express.js handler that infers the settings and handle errors could look like this:

import {getConfig} from '@salesforce/pwa-kit-runtime/utils/ssr-config'
const {Readable} = require('stream')

async function handleSiteMap(req, res) {
    const config = getConfig()
    const host = config.ssrParameters.proxyConfigs.find((config) => {
        return config.path === 'ocapi'
    }).host
    if (!host) {
        return res.status(500).send('Storefront host not configured.')
    }
    const file = req.originalUrl.substring(1)
    const site = config.app.defaultSite
    const url = `https://${host}/s/${site}/${file}`
    let siteMapResponse
    try {
        siteMapResponse = await fetch(url)
    } catch (err) {
        return res.status(500).send('Error fetching sitemap.')
    }

    res.status(siteMapResponse.status)
    const contentType = siteMapResponse.headers.get('content-type')
    res.set('Content-Type', contentType)
    Readable.fromWeb(siteMapResponse.body)
        .once('error', function handleSiteMapPipeError(err) {
            res.status(500).send('Error fetching sitemap.')
        })
        .pipe(res)
}

It is important to pipe the file and proxy the status and headers so it renders correctly (and quickly as sitemap files can be big!)

It would also be useful to add caching headers so it is stored on the edge.

From there, you'd wire it up in ssr.js:

// ...

const runtime = getRuntime()
const {handler} = runtime.createHandler(options, (app) => {
  // ...
  app.get(/^\/sitemap(?:_index|\_(\d+))\.xml$/, handleSiteMap);
})

This solution assumes it is desirable to serve the sitemap file from the storefront/MRT domain. If that wasn't need, you use a handler that returns a HTTP redirect or the MRT redirects to do with with less/no code.

johnboxall avatar Apr 22 '24 22:04 johnboxall

One gotcha in the Sitemaps page is Business Manager is that if your Alias file defines a hostname in the settings key, then only that hostname will show up for selection.

If for some reason, you can't get the hostname or links write, you can also rewrite them using an XML parser:

const { getConfig } = require("@salesforce/pwa-kit-runtime/utils/ssr-config");
const { Readable, Transform } = require("stream");
const sax = require("sax");

async function handleSiteMap(req, res) {
  const config = getConfig();
  const host = config.ssrParameters.proxyConfigs.find((config) => {
    return config.path === "ocapi";
  }).host;
  if (!host) {
    return res.status(500).send("Storefront host not configured.");
  }
  const file = req.originalUrl.substring(1);
  const site = config.app.defaultSite;
  const url = `https://${host}/s/${site}/${file}`;
  let siteMapResponse;
  try {
    siteMapResponse = await fetch(url);
  } catch (err) {
    return res.status(500).send("Error fetching sitemap.");
  }

  res.status(siteMapResponse.status);
  const contentType = siteMapResponse.headers.get("content-type");
  res.set("Content-Type", contentType);

  const parser = sax.createStream(true, { lowercase: true });
  parser.once("error", (_) => {
    res.status(500).send("Error parsing sitemap.");
  });

  const linkRewriter = new Transform({
    transform(chunk, _, callback) {
      const strChunk = chunk.toString();
      // 👇 Rewrite here!!!
      const rewrittenChunk = strChunk.replace(/https:\/\//g, "http://");
      this.push(rewrittenChunk);
      callback();
    },
  });

  Readable.fromWeb(siteMapResponse.body)
    .once("error", (_) => {
      res.status(500).send("Error reading sitemap.");
    })
    .pipe(parser)
    .pipe(linkRewriter)
    .pipe(res);
}

johnboxall avatar May 13 '24 15:05 johnboxall

If you're using eCDN (or any other stacked CDN) for routing with the shopper facing vanity domain name, an alternative is to directly route traffic for the sitemap resource from the CDN:

  1. Update your B2C Commerce alias file with your domain
  2. Run the sitemap job for that domain
  3. Update your CDN routing expression to route requests for sitemaps (/^\/sitemap(?:_index|\_(\d+))\.xml$/) to the B2C Commerce instance.

johnboxall avatar May 15 '24 23:05 johnboxall