Jan Čurn
Jan Čurn
Currently, these pages are not considered failed, and thus not retried. On the other hand, Cheerio Scraper retries them. We should probably consider 5xx errors as failures and retry.
Most login pages just have username/email, password and submit button. We could write a function like `Apify.utils.puppeteer.login()` that would try to find these fields, fill them with provided values and...
It could be called `Apify.utils.puppeteer.extractMicrodata` and look something like this: https://help.apify.com/en/articles/6988663-scraping-data-from-websites-using-schema-org-microdata but ideally, it wouldn't use jQuery.
It doesn't work with local directory storage. Also the documentation of this and related functions is not great. E.g. how does the format work together with `Dataset.forEach` function?
This shouldn't cause any problem and can greatly improve performance. See TODO at https://github.com/apifytech/apify-js/blob/master/src/request_queue.js#L276
Basically, Puppeteer can only take screenshots with the width or height at most 16,834px (this is hard-coded Chrome limit, see https://github.com/GoogleChrome/puppeteer/issues/359). However, for one customer project, we need screenshots of...
Cheerio is quite CPU intensive, so for higher concurrency of the crawler, the CPU chokes. We should explore whether it's possible to run Cheerio download and parsing in a separate...
This will be similar to `handledRequestsCount`, but it will indicate how many requests are yet to be processed. The users of `BasicCrawler` can then use this field to determine whether...
It only supports web sockets via HTTP CONNECT method (used e.g. with SSL). The unit tests (`testWsCall()`) only test for that too. We should add full support for HTTP UPGRADE...
It would be awesome if there was an option to keep the internal properties of the HAR entries, such as `__requestId`. This would allow us to make extensions to the...