algolia-sitemap icon indicating copy to clipboard operation
algolia-sitemap copied to clipboard

Sitemap results rounding

Open bomjastik opened this issue 5 years ago • 11 comments

Hello.

I'm trying to create the site map and everything works well besides one thing.

When I'm trying to create the site map from the index with 29,248 rows, I'm getting 29,000 result rows in the site map. The exact 29,000.

I have tried to do the same for another index with 3,291 rows, and I have got the exact 3000 result rows in it.

For smaller indexes with less than 100 rows, I'm getting correct result rows number.

Is there any reason why my results rounding may happen?

My index.js:

const algoliaSitemap = require('algolia-sitemap');

const algoliaConfig = {
    appId: '',
    apiKey: '',
    indexName: '',
};

function hitToParams({pretty_url, objectID, station_name}) {
    const url = pretty_url => `some_url`;

    const loc = url(pretty_url);

    const lastmod = new Date().toISOString();

    const priority = 0.8;

    return {
        loc,
        lastmod,
        priority,
    };
}

algoliaSitemap({
    algoliaConfig,
    sitemapLoc: 'https://mydomain.com/sitemaps',
    outputFolder: 'sitemaps',
    hitToParams,
}).then(() => {
    console.log('Done generating sitemaps'); // eslint-disable-line no-console
})
    .catch(console.error);

Regards.

bomjastik avatar Jun 03 '19 12:06 bomjastik

Is this happening with v2.2.0, did you try out if this also happens for you on 2.1.0? Thanks!

Haroenv avatar Jun 04 '19 13:06 Haroenv

Hi, @Haroenv I've just started to implement this too and I currently have 2,159 results but I'm only getting 2000 URLs in sitemap.0.xml?

alexpchin avatar Oct 04 '20 01:10 alexpchin

do you have multiple sitemap files @alexpchin? Does this also happen with appId: 'latency', indexName: 'instant_search', apiKey: '8baa3791b46c2a52259007747e743e07'?

Haroenv avatar Oct 04 '20 11:10 Haroenv

Hi @Haroenv, I've just checked with:

algoliaSitemap({
    algoliaConfig: {
      appId: "latency",
      apiKey: "8baa3791b46c2a52259007747e743e07",
      indexName: "instant_search",
    },
    hitToParams: ({ objectID }) => {
      return {
        loc: `https://someurl.com/${objectID}`,
        lastmod: new Date().toISOString(),
        priority: 0.7,
      }
    },
  }),

Only one sitemaps/sitemap-index.xml and sitemaps/sitemap.0.xml are created. Inside sitemaps/sitemap.0.xml there are 21,000 URLs.

  const aggregator = async (args) => {
    let { hits, cursor } = args;
    do {
      if (!hits) {
        return;
      }
      batch = batch.concat(
        hits.reduce((entries, hit) => {
          const entry = hitToParams(hit);
          return entry ? entries.concat(entry) : entries;
        }, [])
      );
      if (batch.length > CHUNK_SIZE) {
        await flush();
      }
      ({ hits, cursor } = await index.browseFrom(cursor));
     **console.log('hits -->', hits.length)**
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 1000
      // hits ---> 469
    } while (cursor);
    **console.log('batch ---> is 21,000', batch.length);**
    await handleSitemap(batch);
    const sitemapIndex = createSitemapindex(sitemaps);
    await saveSiteMap({
      sitemap: sitemapIndex,
      root: outputFolder,
      filename: 'sitemap-index',
    });
  };

*Aside, at some point we could potentially add a check if the output folder hasn't already been created and create it?

alexpchin avatar Oct 04 '20 11:10 alexpchin

Thanks for checking, that indeed doesn't look right! Maybe the last batch doesn't have a cursor, and therefore the loop stops. We should switch it to a regular while loop and do the fetching the beginning of the loop to avoid it I think. The program flow will be much simpler

for your aside, that definitely makes sense I think mkdir now has a -p option in node that we can use

Haroenv avatar Oct 04 '20 13:10 Haroenv

Hi @Haroenv, sorry, yes I was going to follow this up with a suggestion for a solution as the problem is with the cursor but I was eating some noodles...

We could just solve quickly by using a run flag that updates at the start of the while loop if there is no curser?

  const aggregator = async (args) => {
    let { hits, cursor } = args;
    let run = true;
    do {
      run = !!cursor;
      if (!hits) {
        return;
      }
      batch = batch.concat(
        hits.reduce((entries, hit) => {
          const entry = hitToParams(hit);
          return entry ? entries.concat(entry) : entries;
        }, [])
      );
      if (batch.length > CHUNK_SIZE) {
        await flush();
      }
      ({ hits, cursor } = await index.browseFrom(cursor));
    } while (run);
    await handleSitemap(batch);
    const sitemapIndex = createSitemapindex(sitemaps);
    await saveSiteMap({
      sitemap: sitemapIndex,
      root: outputFolder,
      filename: 'sitemap-index',
    });
  };

  return index.browse(params).then(aggregator);
}

This would work but arguably a little more confusing in terms of flow.

alexpchin avatar Oct 04 '20 13:10 alexpchin

I think that makes sense, but a regular while with the fetching first will be simpler probably :)

Haroenv avatar Oct 05 '20 07:10 Haroenv

@Haroenv Did you want me to PR? I saw that there is also a number of chores outstanding too?

alexpchin avatar Oct 07 '20 10:10 alexpchin

All PRs that were open were for devDependencies, so not urgent for this project. A PR would be well appreciated if you confirm that this is the source of the bug. Thanks @alexpchin !

Haroenv avatar Oct 07 '20 14:10 Haroenv

@Haroenv Hi was there any follow up on this issue? I have started prototyping an implementation using this library but this bug is a blocker for now

Pr00xxy avatar Sep 06 '21 12:09 Pr00xxy

Does the fix suggested work for you @Pr00xxy?

Haroenv avatar Sep 06 '21 13:09 Haroenv