gesp icon indicating copy to clipboard operation
gesp copied to clipboard

Judicialis, Postprocessing and waiting in downloads

Open galvusdamor opened this issue 2 years ago • 1 comments

I've added a module to download older decisions from the website judicialis.

The current version of geps "only" downloads the decisions as html/xml/pdf files. This fork contains code to post-process these raw files into a uniform and machine readable format.

If post-processing is activated (using "-P"), gesp will create a new directory "postprocessed" in which it will sort all court decisions by state and then by court. The files are named by the ECLI (European Case Law Identifier) of that decision. If an ECLI is provided that ECLI is used, otherwise the gesp will re-infert the correct ECLI. Each post-processed file begins with a header containing basic information about the decision, followed by a separator-line and then the plain-text body of the decision.

Lastly, there is now the command line option "-w" which forces gesp to pause after each download from a decision provider. This allows us not to overstrain the computational resources of the providers.

galvusdamor avatar Jun 07 '23 15:06 galvusdamor

Thank you! I have manually added the "-w" feature and your bugfixes (with some minor changes) to the "feature-ECLI" branch and subsequently made it the "master" branch. I still think we should keep judicialis out of the master branch though, since it is not an official source. I will create an extra branch for it.

niklaswais avatar Jun 19 '23 04:06 niklaswais