crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

RequestQueue metrics don't get updated

Open jeanbmar opened this issue 3 years ago • 1 comments

Describe the bug

RequestQueue metrics aren't updated after calling addRequest. I've noticed most methods get the queue through findRequestQueueByPossibleId while toRequestQueueInfo works with this so it might actually be using the wrong object.

To Reproduce

/* eslint-disable no-console */
import { Actor } from 'apify';

await Actor.init();
console.info('actor started');
const queue = await Actor.openRequestQueue();
for (let i = 0; i < 50; i += 1) {
  await queue.addRequest({ url: 'https://apify.com' });
}
const queueInfo = await queue.getInfo();
console.log(queueInfo);
console.info('scraping finished, exiting actor...');
await Actor.exit();

Result:

2022-09-05T07:58:10.396Z actor started
2022-09-05T07:58:12.053Z {
2022-09-05T07:58:12.054Z   id: '...',
2022-09-05T07:58:12.055Z   userId: '...',
2022-09-05T07:58:12.055Z   createdAt: 2022-09-05T07:58:07.734Z,
2022-09-05T07:58:12.056Z   modifiedAt: 2022-09-05T07:58:07.734Z,
2022-09-05T07:58:12.057Z   accessedAt: 2022-09-05T07:58:07.734Z,
2022-09-05T07:58:12.057Z   expireAt: 2022-09-12T07:58:07.734Z,
2022-09-05T07:58:12.058Z   totalRequestCount: 0,
2022-09-05T07:58:12.058Z   handledRequestCount: 0,
2022-09-05T07:58:12.059Z   pendingRequestCount: 0,
2022-09-05T07:58:12.059Z   actId: '...',
2022-09-05T07:58:12.060Z   actRunId: '...',
2022-09-05T07:58:12.060Z   hadMultipleClients: false,
2022-09-05T07:58:12.061Z   stats: {
2022-09-05T07:58:12.061Z     readCount: 0,
2022-09-05T07:58:12.062Z     writeCount: 0,
2022-09-05T07:58:12.063Z     deleteCount: 0,
2022-09-05T07:58:12.063Z     headItemReadCount: 0,
2022-09-05T07:58:12.064Z     storageBytes: 0
2022-09-05T07:58:12.064Z   }
2022-09-05T07:58:12.065Z }
2022-09-05T07:58:12.066Z scraping finished, exiting actor...

Expected behavior totalRequestCount and other metrics shouldn't be 0.

System information:

  • Apify Cloud

jeanbmar avatar Sep 05 '22 08:09 jeanbmar

~~In memory-storage, stats is returned as an empty object 👀. Did you run this on the platform or locally?~~

Just noticed that you ran this on the platform, sorry! 😅 It might be an issue with our caching of request queues, or the API not returning the value

vladfrangu avatar Sep 12 '22 08:09 vladfrangu

Back again! Sorry for the slow progress, do you happen to have a link to a run where you encountered this issue? Would help tremendously with debugging this further! 🙏

vladfrangu avatar Oct 10 '22 11:10 vladfrangu

Hey! So after asking about this, this is not a bug in Crawlee, nor the API. Due to the nature of stats on a possibly rapidly changing object (like a request queue), it can take between 5 and 10 seconds for the object to be accurate / updated. I know that this might not be ideal depending on the use case, but there's not much we can do short of storing local statistics about the queue too...

vladfrangu avatar Oct 12 '22 21:10 vladfrangu