Jindřich Bär

Results 108 comments of Jindřich Bär

You can use [`file-type`](https://www.npmjs.com/package/file-type) ([example with streams here](https://github.com/sindresorhus/file-type?tab=readme-ov-file#filetypefromstreamstream)), in WCC it worked pretty well. You can even `pipe` through it 👀

![obrazek](https://github.com/apify/crawlee/assets/61918049/1347f14c-c959-4242-adf9-1cf706a2dd21) More WCC users are complaining about this. Do we know how to approach this issue yet?

> Which could technically be used if desired (`CRAWLEE_STORAGE_DIR=/tmp/crawlee/storage`) This is only true to an extent - the ephemeral storage can be shared between different Lambda invocations, provided they run...

What's the status on this? The blocking issue seems to be done + I have done this for the academy section already, so it's only a matter of whether we...

Yeah, if we could also remove the rogue H1 headings in the article bodies, that would be great :) From what I remember, they (are|were) getting picked up by the...

Imo keeping the title in the frontmatter is a bit nicer, but it apparently shouldn't matter: ![obrazek](https://github.com/apify/apify-docs/assets/61918049/ec9110f0-3d6b-45a1-ad65-3a8b86fa7269) Go with your instinct then, from the docs it seems that Docusaurus can...

How's this looking? Anything we can help with?

Alright, ready for the next round of reviews! I simplified the parsing logic quite a lot (in my eyes) - in `SitemapRequestList`, there is now just one queue of parsed...

Alright, time for the (yet another) final review! [My previous comment](https://github.com/apify/crawlee/pull/2498#issuecomment-2191669628) should provide enough guidance for the top-level ideas.

Huh, seems that the Playwright patch somehow makes `navigator.webdriver` leak in Firefox. ![image](https://github.com/apify/crawlee/assets/61918049/08f9c5f9-f2a3-4382-81d0-f8deaeafbd46) Also, `jest`'s `expect` in a try-caught `requestHandler` makes for a bad debugging experience. I'll make a separate...