feat(sample_caller): add --save-respones optional arg
The equivalent on juriscraper of the response headers and content saving that will be implemented in Courtlistener to address https://github.com/freelawproject/courtlistener/issues/4308
@flooie what do you think? It's the general idea of what would be implemented on the Courtlistener site, but instead of saving to /tmp/ we would be saving to an S3 bucket
I think it's great. Let's get this working on the system ASAP - how do we implement it to save into S3?
@grossir reupping this.
?
Can you resolve the conflicts that have bubbled up on this @grossir
This looks good to me @grossir -
My only comments are that I would prefer to put this in a directory inside tmp maybe /tmp/juriscraper/
Also the conflicts need to be dealt with but other than that I think it's fine to move forward on the juriscraper side.
@flooie I solved the merge conflicts and updates the directory to save the files to '/tmp/juriscraper/'
@flooie as we talked earlier; I simplified the sample caller. I deleted the daemon and the report options and code, which added a lot of complexity to the caller and were never used. This allowed an easier to read caller
I just realized a problem with this approach. A bunch of sites override _download or make secondary requests in different ways, which would make us miss responses to store. I am looking into how to account for this
https://github.com/freelawproject/juriscraper/issues/1264
This is ready for review again @flooie
To test this with a site that uses secondary requests per each scraped case:
python sample_caller.py -c juriscraper.opinions.united_states.state.ky --backscrape --backscrape-start=2024/01/15 --backscrape-end=2024/01/30 -vvv --save-responses
To use with a "normal" site:
python sample_caller.py -c juriscraper.opinions.united_states.state.pasuperct -vvv --save-responses
or
python sample_caller.py -c juriscraper.opinions.united_states.state.nd -vvv --save-responses
looks good to me - the failing test is unrelated.