google-indexing-script
google-indexing-script copied to clipboard
Rate limited too soon
I'm running the script locally like this:
yarn index example.com
❯ yarn index example.com
🔎 Processing site: sc-domain:example.com
👉 Found 1189 URLs in 2 sitemap
📦 Batch 1 of 24 complete
📦 Batch 2 of 24 complete
📦 Batch 3 of 24 complete
📦 Batch 4 of 24 complete
📦 Batch 5 of 24 complete
📦 Batch 6 of 24 complete
📦 Batch 7 of 24 complete
📦 Batch 8 of 24 complete
📦 Batch 9 of 24 complete
📦 Batch 10 of 24 complete
📦 Batch 11 of 24 complete
📦 Batch 12 of 24 complete
📦 Batch 13 of 24 complete
📦 Batch 14 of 24 complete
📦 Batch 15 of 24 complete
📦 Batch 16 of 24 complete
📦 Batch 17 of 24 complete
📦 Batch 18 of 24 complete
📦 Batch 19 of 24 complete
📦 Batch 20 of 24 complete
📦 Batch 21 of 24 complete
📦 Batch 22 of 24 complete
📦 Batch 23 of 24 complete
📦 Batch 24 of 24 complete
👍 Done, here's the status of all 1189 pages:
• ✅ Submitted and indexed: 410 pages
• 👀 Crawled - currently not indexed: 151 pages
• 👀 Discovered - currently not indexed: 2 pages
• 🔀 Page with redirect: 2 pages
• 🚦 RateLimited: 506 pages
• ❌ Server error (5xx): 9 pages
• ❌ Alternate page with proper canonical tag: 1 pages
• ❌ Duplicate, Google chose different canonical than user: 108 pages
✨ Found 659 pages that can be indexed.
[... list of urls]
📄 Processing url: https://example.com/foo/bar
🕛 Indexing already requested previously. It may take a few days for Google to process it.
📄 Processing url: https://example.com/foo/bar1
🚦 Rate limit exceeded, try again later.
The rate limit exceeds after only around 100-120 urls, and if I rerun it starts from start and again aborts on rate limit aroudn 100-120 urls, so I'm not able to request index for all the URLs that come later.
What am I doing wrong?
I think this was fixed by a recent PR, want to try again?
The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.
I added this to index.ts:168 and it can now pick up where it left off:
statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() };
writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));
The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.
I added this to index.ts:168 and it can now pick up where it left off:
statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() }; writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));
I'm not quite sure where you added this line. Did you replace something else with it or simply add it? Also, could this be implemented in the lib itself in a PR, so everyone else could benefit from it?