Judicialis, Postprocessing and waiting in downloads
I've added a module to download older decisions from the website judicialis.
The current version of geps "only" downloads the decisions as html/xml/pdf files. This fork contains code to post-process these raw files into a uniform and machine readable format.
If post-processing is activated (using "-P"), gesp will create a new directory "postprocessed" in which it will sort all court decisions by state and then by court. The files are named by the ECLI (European Case Law Identifier) of that decision. If an ECLI is provided that ECLI is used, otherwise the gesp will re-infert the correct ECLI. Each post-processed file begins with a header containing basic information about the decision, followed by a separator-line and then the plain-text body of the decision.
Lastly, there is now the command line option "-w" which forces gesp to pause after each download from a decision provider. This allows us not to overstrain the computational resources of the providers.
Thank you! I have manually added the "-w" feature and your bugfixes (with some minor changes) to the "feature-ECLI" branch and subsequently made it the "master" branch. I still think we should keep judicialis out of the master branch though, since it is not an official source. I will create an extra branch for it.