Chris
Chris
> This would be very helpful and for now I'm working around this by changing the call to `amplify` in the sample script in https://docs.amplify.aws/cli/usage/headless#amplify-pull-parameters with: > > ```shell >...
> We configure for VSCode in our repo as well: > > https://github.com/vercel/turbo/blob/178a2948eeb795ba925838b2380bd5c532a2cd5f/.vscode/settings.json#L12 @mehulkar you comment is so valuable! It should definitely be added to the official docs!
Hi @adbar thanks for the quick response! Are you planning to add this feature? I could help you out if you have some more details implementation-wise.
I thought about some options, but as you said, the combination of different extractors makes it difficult to calculate a real score. One idea that might be feasible: Let's say...
Yes, you might be right. Even though, some pre-filtering on the raw html, like removing semantic elements (``, ``, ``) should maybe decrease the variance a bit. I will try...
Yes, of course!
In case you are interested, here are pages with `is_probably_readerable=False`: ```py [ { "file": "die-partei.net.luebeck.html", "score": 0.7719298245614035, "html": 57, "trafilatura": 44, }, { "file": "schleifen.ucoz.de.briefe.html", "score": 1.052325581395349, "html": 172, "trafilatura":...
I opened a new PR for `is_probably_readerable()`. As I've described in the other PR, there is a difference in the results for the implementation of `is_probably_readerable()` with BeautifulSoup vs LXML....
> @zirkelc I'm not sure what to do with this pull request, do you want to keep working on it by leveraging the functionality you just introduced? I wanted to...
Okay, I agree. Let's close this PR and I will create a new issue to discuss the port.