Jan Buchar

Results 217 comments of Jan Buchar

I made a gist to illustrate a possible new inheritance hierarchy, feel free to comment. https://gist.github.com/janbuchar/0412e1b4224065e40e937e91d474f145

> I'm even thinking about whether specific subclasses like `BeautifulSoupCrawler` / `ParselCrawler` might be unnecessary when the `HttpCrawler` class itself can serve the purpose with the proper configuration of parsers...

Thanks @CodeMan62 for the contribution! I assume you're trying to fix https://github.com/apify/crawlee/issues/2499? If so, we should link it to this PR. I haven't reviewed the code yet, but here are...

> @janbuchar Regarding the rename to RequestManagerTandem - I've reverted this change for now. I think we should: > > 1. Discuss the naming convention changes more broadly > 2....

Hi @CodeMan62, I still see some test failures - could you look into that please?

@CodeMan62 I can try, but I'm stretched kinda thin right now. What did you already try? Is there anything you could tell me about the failing tests?

@barjin @B4nan I updated the PR and made sure it passes tests. It should be fully backwards compatible. Can you take another look?

Python implementation here https://github.com/apify/crawlee-python/pull/777

The interface should mimic that of `got-scraping` for BC reasons (https://github.com/sindresorhus/got/blob/main/documentation/2-options.md#url), with features not supported by https://github.com/apify-projects/node-curl-impersonate/tree/master omitted. Index signatures will be used to keep compatibility with eccentric usage of...

> Shouldn't we just do this in a breaking change instead? Feels weird to try and make it still respect got-scrapings interface to me... It doesn't seem like that much...