Jeremy Singer-Vine
Jeremy Singer-Vine
Thanks for your interest in `waybackpack`, @reagle. Here's what's happening: - The Wayback Machine's CDX API theoretically [provides a way to check for (and thus skip over) duplicate content](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#duplicate-counter). If...
Thank you, @Quuxplusone, for describing the confusion you encountered and for proposing improvements! They sound reasonable to me and will attempt something like that the next time I'm working on...
@virtadpt Do you think "random" part is important, or does this [preexisting functionality](https://github.com/jsvine/waybackpack?tab=readme-ov-file#usage) suffice?
@nateph Thank you for flagging. Are you able to provide any URLs which demonstrate this behavior? @otacon6530 Correct, the current behavior of `waybackpack` is only to download individual pages, not...
Hi @rajathsalegame, and thanks for the suggestion. I agree that this could be a useful feature. Unfortunately for your `multiprocessing` use-case, a lot of the heavy processing load currently is...
If I'm understanding the question correctly, `.extract_text(layout=True, ...)` may produce the sort of output you're seeking. Or not quite?
Thanks for the suggestion, @Pk13055. The idea seems reasonable; I'll investigate how this might be added.
Hi @Pk13055, and thanks again for the suggestion. Now added in https://github.com/jsvine/pdfplumber/commit/d39302fa2ce5976f92276f60d10c127167f94d26 and available on the `develop` branch.
Thanks for flagging this, @Demetrio92 — what sort of check would you propose?
Thanks for providing that explanation. The first check seems straightforward, while the second a little less so. A couple more questions: - In what scenario would the first check pass...