Friedrich Lindenberg
Friedrich Lindenberg
You have to give Microsoft credit for its consistency: instead of storing E-Mail messages in Outlook as RFC822 plain text, they came up with their own super funky file format...
At the moment, the `ingestors` will call on an HTTP service provided by `convert-document` (in this repo) to convert documents in various types (Word, Powerpoint, etc.) to PDF files, which...
What's broken? * We're seeing incorrect text extraction out of some documents, especially those containing Arabic text. * Text from images isn't being extracted into the right location in the...
It seems like we fail to parse files which are created in Excel with write-protection, even though they are readable without a password in the app. There has to be...
This is specifically with regards to `csvsql`, where loading a CSV file with `Some manually entered - header (TM)` will give you a data structure that is really hard to...
Final changes to #415 , fixes #412
Hey all. I've just pushed dataset 1.6.0, which pins the sqlalchemy dependency for this library to >= 1.3.2, < 2.0.0. This is meant as a hotfix to prevent people from...
Columns: * EntityID * Featured Properties * Sources * Temporal Extent * Ignore? Address Keep non-matchable fields as an array? Persons
We have a few data sources where we use the `Sanction` schema to describe non-sanction adverse information, such as a procurement debarment, a criminal record or a regulatory penalty. We...
We want to be able to produce a data file that just contains the changed entities day to day. The file also needs to give information regarding entities that have...