crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Refer an element selected by "utils.enqueueLinks" in "transformRequestFunction".

Open polikeiji opened this issue 4 years ago • 1 comments

The target element I want to add its links to RequestQueue has a date text, and I want to parse the text and then add its link to the queue if it fulfills some time-based condition.

To do that, I'd like to refer to the selected element in "transformRequestFunction". I think there are no instances representing DOM elements in the scope of calling the "transformRequestFunction" function, so I assume we can't refer it in the function. https://github.com/apify/apify-js/blob/da9bcf36b352c0618d61ac38d229b47f058e66ee/src/enqueue_links/enqueue_links.js#L135-L139

Could we consider support to pass the selected DOM instance to the "transformRequestFunction"? Or shouldn't we use the "utils.enqueueLinks" to add links to RequestQueue with these kinds of complex filtering?

polikeiji avatar Apr 11 '21 05:04 polikeiji

It should not be difficult to do for Cheerio, but it might be problematic with Puppeteer and Playwright, since the DOM elements actually exist only in the browser, not in the Node.js process. We might be able to pass around the JSHandle for that object, but all actions on that handle would be async, which means the transformRequestFunction would have to become async, which could be a breaking change.

So yeah, technically, it should be possible and thanks for the suggestion. I'll keep the issue open to track this, but given the fact there's an easy way to do this without using enqueueLinks, I'm afraid we won't give it a high priority.

mnmkng avatar Apr 11 '21 08:04 mnmkng