crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Support Deno runtime

Open tigitz opened this issue 2 years ago • 7 comments

Which package is the feature request for? If unsure which one to select, leave blank

crawlee

Feature

I would like to use Deno for my crawling projects.

Motivation

All the features and comparisons with node can be found here: https://deno.com/runtime

Ideal solution or implementation, and any additional constraints

Add some tests in CI to make sure package can be run with Deno runtime

Alternative solutions or implementations

No response

Other context

Here are currently discovered list of Deno issues blocking the support:

  • https://github.com/denoland/deno/issues/19113
  • https://github.com/denoland/deno/issues/19215
  • https://github.com/denoland/deno/issues/19214 (can be manually patched to circumvent the issue)
  • https://github.com/denoland/deno/issues/19238

tigitz avatar May 12 '23 17:05 tigitz

I see the unref issue you created on deno side is now resolved, was that the only problem?

FWIW we will most probably switch to native ESM in the next major.

B4nan avatar May 31 '23 13:05 B4nan

@B4nan I've updated the list of issue I've posted in the Deno project since then.

With a manual patch, it's now possible to run the current simple usage example:

// Add import of CheerioCrawler
import { RequestQueue, CheerioCrawler } from 'crawlee';

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: 'https://crawlee.dev' });

// Create the crawler and add the queue with our URL
// and a request handler to process the page.
const crawler = new CheerioCrawler({
    requestQueue,
    // The `$` argument is the Cheerio object
    // which contains parsed HTML of the website.
    async requestHandler({ $, request }) {
        // Extract <title> text with Cheerio.
        // See Cheerio documentation for API docs.
        const title = $('title').text();
        console.log(`The title of "${request.url}" is: ${title}.`);
    }
})

// Start the crawler and wait for it to finish
await crawler.run();

However, I've tested for my own large project in the meantime and I've encountered some issues with enqueueLinks not queuing anything and the processing time being around ~10x higher. Which is definitely not the improvements I expected.

I plan to redo some tests and gather as much info as my expertise allows me to before sharing those findings.

tigitz avatar Jun 01 '23 12:06 tigitz

I'd also like to see Deno support! If Crawlee ships native ESM I imagine that would be beneficial too.

lloydjatkinson avatar Jul 04 '23 17:07 lloydjatkinson

Would love to use crawlee in my deno project! Is this on the project timeline?

maheshbansod avatar Jul 15 '24 12:07 maheshbansod