open-gov-crawlers
open-gov-crawlers copied to clipboard
Parse government documents into well formed JSON
See: * https://www.geeksforgeeks.org/readability-index-pythonnlp/ * https://github.com/wimmuskee/readability-score * https://github.com/cdimascio/py-readability-metrics * https://pythonawesome.com/score-the-readability-of-text-using-popular-readability-metrics/
### The Idea: enable non-programmers to write test cases. **Context:** The English version has its tests in [rome_statute_test.py](https://github.com/public-law/open-gov-crawlers/blob/master/test/public_law/parsers/int/rome_statute_test.py). Each language would need its own test file. Eventually these could be...
Source in HTML: https://www.fedlex.admin.ch/eli/cc/2002/586/it
The source text is here, in PDF form: * https://www.icc-cpi.int/Publications/Statut-de-Rome.pdf Also here, in HTML: * https://www.fedlex.admin.ch/eli/cc/2002/586/fr
Römisches Statut des Internationalen Strafgerichtshofs https://de.wikipedia.org/wiki/Römisches_Statut_des_Internationalen_Strafgerichtshofs ...links to this German translation in HTML (!) at a Swiss organization: https://www.fedlex.admin.ch/eli/cc/2002/586/de ...as well as French and Italian. It also has a nice...
- [ ] Find the current 2021 text, in HTML if possible, otherwise PDF. - [ ] Write the tests. - [ ] Write the parser.
See https://github.com/beartype/beartype