sitemap-generator icon indicating copy to clipboard operation
sitemap-generator copied to clipboard

Only one URL has been discovered

Open jean-christophe-manciot opened this issue 4 years ago • 6 comments

Do you want to request a feature or report a bug? bug

$ npm install -S sitemap-generator
npm WARN saveError ENOENT: no such file or directory, open '/home/actionmystique/.config/sitemap-generator/package.json'
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN enoent ENOENT: no such file or directory, open '/home/actionmystique/.config/sitemap-generator/package.json'
npm WARN sitemap-generator No description
npm WARN sitemap-generator No repository field.
npm WARN sitemap-generator No README data
npm WARN sitemap-generator No license field.

+ [email protected]
added 39 packages from 64 contributors and audited 58 packages in 4.133s
found 0 vulnerabilities

sitemap-generator.js:

const SitemapGenerator = require('sitemap-generator');

// create generator
const generator = SitemapGenerator('https://git.sdxlive.com', {
  filepath: './sitemap.xml',
  lastMod: true,
  maxDepth: 9999,
  maxEntriesPerFile: 50000,
  stripQuerystring: true
});

// register event listeners
generator.on('done', () => {
  // sitemaps created
});

// start the crawler
generator.start();
node sitemap-generator.js

leads to sitemap.xml:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://git.sdxlive.com/</loc>
    <lastmod>2020-03-07</lastmod>
  </url>
</urlset>

@lgraubner What am I missing?

jean-christophe-manciot avatar Mar 07 '20 08:03 jean-christophe-manciot

Same issue with sitemap-generator-cli:

$ sudo npm install -g sitemap-generator-cli
/usr/local/bin/sitemap-generator -> /usr/local/lib/node_modules/sitemap-generator-cli/index.js
+ [email protected]
added 47 packages from 67 contributors in 2.363s
$ sitemap-generator --last-mod https://git.sdxlive.com

sitemap.xml:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://git.sdxlive.com/</loc>
    <lastmod>2020-03-07</lastmod>
  </url>
</urlset>

jean-christophe-manciot avatar Mar 07 '20 08:03 jean-christophe-manciot

Facing the same issue did anyone found a workaround?

dhruvkaushal11 avatar Apr 06 '20 11:04 dhruvkaushal11

I also encountered the same problem when I visited my local VuePress document website .

hanshou101 avatar Nov 21 '20 16:11 hanshou101

We are experiencing this same problem. Any update on this?

kevinvella1 avatar Sep 01 '21 11:09 kevinvella1

Same here. It's simply not working, generates sitemap with the links on the initial URL only. No deeper crawling.

Wintermute79 avatar Aug 25 '22 15:08 Wintermute79

For anyone else who comes across this, if only your root webpage is included in the sitemap, it usually means that your website pages are being generated client-side by a Javascript framework such as React, Vue, etc. Since the sitemap crawler doesn't execute Javascript, it will just see a mostly blank page. You can confirm this by using curl YOUR_DOMAIN from your terminal...if your page <body> is mostly empty and doesn't contain your actual webpage HTML then you have this problem.

A couple solutions:

  1. Use server-side rendering with your frontend framework (like next.js for React or nuxt.js for Vue) to generate complete HTML pages on the server.

  2. Use a prerendering service like prerender.io or ostr.io to pre-render your pages for search engine crawlers. You can then build the sitemap by telling sitemap-generator to pretend it's Googlebot. This will then tell your site to return the full prerendered HTML page to sitemap-generator. Using the cli version:

sitemap-generator --verbose --max-concurrency 2 --user-agent "Googlebot/2.1 (+http://www.google.com/bot.html)" YOUR_DOMAIN

dkoo761 avatar Feb 25 '23 22:02 dkoo761