juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

feat(sample_caller): add --save-respones optional arg

Open grossir opened this issue 1 year ago • 5 comments

The equivalent on juriscraper of the response headers and content saving that will be implemented in Courtlistener to address https://github.com/freelawproject/courtlistener/issues/4308

grossir avatar Aug 29 '24 23:08 grossir

@flooie what do you think? It's the general idea of what would be implemented on the Courtlistener site, but instead of saving to /tmp/ we would be saving to an S3 bucket

grossir avatar Aug 29 '24 23:08 grossir

I think it's great. Let's get this working on the system ASAP - how do we implement it to save into S3?

flooie avatar Sep 04 '24 20:09 flooie

@grossir reupping this.

flooie avatar Sep 10 '24 19:09 flooie

?

flooie avatar Sep 26 '24 17:09 flooie

Can you resolve the conflicts that have bubbled up on this @grossir

flooie avatar Oct 21 '24 14:10 flooie

This looks good to me @grossir -

My only comments are that I would prefer to put this in a directory inside tmp maybe /tmp/juriscraper/

flooie avatar Nov 21 '24 19:11 flooie

Also the conflicts need to be dealt with but other than that I think it's fine to move forward on the juriscraper side.

flooie avatar Nov 21 '24 19:11 flooie

@flooie I solved the merge conflicts and updates the directory to save the files to '/tmp/juriscraper/'

grossir avatar Dec 02 '24 18:12 grossir

@flooie as we talked earlier; I simplified the sample caller. I deleted the daemon and the report options and code, which added a lot of complexity to the caller and were never used. This allowed an easier to read caller

grossir avatar Dec 04 '24 02:12 grossir

I just realized a problem with this approach. A bunch of sites override _download or make secondary requests in different ways, which would make us miss responses to store. I am looking into how to account for this

https://github.com/freelawproject/juriscraper/issues/1264

grossir avatar Dec 04 '24 17:12 grossir

This is ready for review again @flooie

To test this with a site that uses secondary requests per each scraped case:

 python sample_caller.py -c juriscraper.opinions.united_states.state.ky --backscrape --backscrape-start=2024/01/15 --backscrape-end=2024/01/30 -vvv --save-responses

To use with a "normal" site:

python sample_caller.py -c juriscraper.opinions.united_states.state.pasuperct -vvv --save-responses

or

python sample_caller.py -c juriscraper.opinions.united_states.state.nd -vvv --save-responses

grossir avatar Dec 04 '24 20:12 grossir

looks good to me - the failing test is unrelated.

flooie avatar Dec 05 '24 17:12 flooie