google-indexing-script Rate limited too soon

I'm running the script locally like this:

yarn index example.com

❯ yarn index example.com
🔎 Processing site: sc-domain:example.com
👉 Found 1189 URLs in 2 sitemap
📦 Batch 1 of 24 complete
📦 Batch 2 of 24 complete
📦 Batch 3 of 24 complete
📦 Batch 4 of 24 complete
📦 Batch 5 of 24 complete
📦 Batch 6 of 24 complete
📦 Batch 7 of 24 complete
📦 Batch 8 of 24 complete
📦 Batch 9 of 24 complete
📦 Batch 10 of 24 complete
📦 Batch 11 of 24 complete
📦 Batch 12 of 24 complete
📦 Batch 13 of 24 complete
📦 Batch 14 of 24 complete
📦 Batch 15 of 24 complete
📦 Batch 16 of 24 complete
📦 Batch 17 of 24 complete
📦 Batch 18 of 24 complete
📦 Batch 19 of 24 complete
📦 Batch 20 of 24 complete
📦 Batch 21 of 24 complete
📦 Batch 22 of 24 complete
📦 Batch 23 of 24 complete
📦 Batch 24 of 24 complete

👍 Done, here's the status of all 1189 pages:
• ✅ Submitted and indexed: 410 pages
• 👀 Crawled - currently not indexed: 151 pages
• 👀 Discovered - currently not indexed: 2 pages
• 🔀 Page with redirect: 2 pages
• 🚦 RateLimited: 506 pages
• ❌ Server error (5xx): 9 pages
• ❌ Alternate page with proper canonical tag: 1 pages
• ❌ Duplicate, Google chose different canonical than user: 108 pages

✨ Found 659 pages that can be indexed.

[... list of urls]

📄 Processing url: https://example.com/foo/bar
🕛 Indexing already requested previously. It may take a few days for Google to process it.

📄 Processing url: https://example.com/foo/bar1
🚦 Rate limit exceeded, try again later.

The rate limit exceeds after only around 100-120 urls, and if I rerun it starts from start and again aborts on rate limit aroudn 100-120 urls, so I'm not able to request index for all the URLs that come later.

What am I doing wrong?

Apr 26 '24 22:04 mxmzb

I think this was fixed by a recent PR, want to try again?

May 15 '24 08:05 goenning

The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.

I added this to index.ts:168 and it can now pick up where it left off:

    statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() };
    writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));

May 16 '24 14:05 ostwilkens

The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.

I added this to index.ts:168 and it can now pick up where it left off:
    statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() };
    writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));

I'm not quite sure where you added this line. Did you replace something else with it or simply add it? Also, could this be implemented in the lib itself in a PR, so everyone else could benefit from it?

Aug 07 '24 09:08 marcelscruz

google-indexing-script google-indexing-script copied to clipboard

Rate limited too soon

google-indexing-script
google-indexing-script copied to clipboard