Matěj Volf
Matěj Volf
This is a limitation of inner workings of the CMS: posts are a collection - every post there has a common layout, structure, and is included on the Latest Stories...
I dug into what exactly is happening there, this is the code that jinja generates for the buggy template: (I stripped some whitespace, my template looks like this: `"{% set...
I'd say this could be the desired behavior. HOWEVER, they are also kept when `trim_blocks` and `lstrip_blocks` are true, which they probably shouldn't. Interestingly though, they are correctly deleted when...
I looked into how the `markup ##- comment` vs `##- comment` works: in `Lexer.tokeniter`, in `markup\n##- commment`, the regex match is `markup\n##-`. The `-` is detected, and the markup part...
Yes, during debugging I noticed that jinja parses across multiple newlines at once. I don't think this is an issue per se, but it just needs to take care when...
Sorry for bumping, but is there anything I could do to help get this merged?
Is there some workaround / version that I can pin to avoid this? Expedia reviews scraper is hitting this quite consistently.
yeah, I'm on 3.9.1. From above > It seems that the fix didn't help, the issue is persisting in [email protected]. I didn't understand that 3.9.2 shouldn't do this. Will upgrade,...
So, a little investigation writeup: To construct the context object for the request handler, [`HttpCrawler._runRequestHandler()`](https://github.com/apify/crawlee/blob/07f80e59643ae7740f1ddeb043c12a6c85a23f61/packages/http-crawler/src/internals/http-crawler.ts#L465) calls `this._parseResponse()`, which, in turn, [calls `this._parseHTML()`](https://github.com/apify/crawlee/blob/07f80e59643ae7740f1ddeb043c12a6c85a23f61/packages/http-crawler/src/internals/http-crawler.ts#L688). `_parseHTML()` is [overriden in `CheerioCrawler`](https://github.com/apify/crawlee/blob/07f80e59643ae7740f1ddeb043c12a6c85a23f61/packages/cheerio-crawler/src/internals/cheerio-crawler.ts#L175), and returns ```js...
Hm, after more investigation it seems like the root of the issue is a corrupted (or some other issue with it) brotli compression. Both Firefox and Chromium on my machine...