ingest-file icon indicating copy to clipboard operation
ingest-file copied to clipboard

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.

Results 91 ingest-file issues
Sort by recently updated
recently updated
newest added

Bumps [black](https://github.com/psf/black) from 23.12.1 to 24.3.0. Release notes Sourced from black's releases. 24.3.0 Highlights This release is a milestone: it fixes Black's first CVE security vulnerability. If you run Black...

dependencies
python

This adds HEIC/HEIF support for alephdata/aleph#3918 using a pillow plugin. Todos: - [ ] find a better test image and check OCR as well - [ ] figure out previewing...

Bumps [pillow](https://github.com/python-pillow/Pillow) from 10.1.0 to 10.2.0. Release notes Sourced from pillow's releases. 10.2.0 https://pillow.readthedocs.io/en/stable/releasenotes/10.2.0.html Changes Add keep_rgb option when saving JPEG to prevent conversion of RGB colorspace #7553 [@​bgilbert] Trim...

dependencies
python

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6. Changelog Sourced from cryptography's changelog. 41.0.6 - 2023-11-27 * Fixed a null-pointer-dereference and segfault that could occur when loading certificates from a PKCS#7 bundle....

dependencies
python

Bumps [fingerprints](https://github.com/alephdata/fingerprints) from 1.1.1 to 1.2.3. Commits 14c3779 Bump version: 1.2.2 → 1.2.3 0c2f445 extra forms b77e655 Bump version: 1.2.1 → 1.2.2 efa3204 support more polish shortenings d0180b6 Bump version:...

dependencies
python

Bumps [click](https://github.com/pallets/click) from 8.1.6 to 8.1.7. Release notes Sourced from click's releases. 8.1.7 This is a fix release for the 8.1.x feature branch. Changes: https://click.palletsprojects.com/en/8.1.x/changes/#version-8-1-7 Milestone: https://github.com/pallets/click/milestone/22?closed=1 Changelog Sourced from...

dependencies
python

We're ingesting some files and we're getting an alert in our monitorization system regarding a high number of context switching from the ingestors processes. I know it's a hard issue...

bug

Our current retry logic for converting documents (shelling out to LibreOffice) is based on two constants: the number of retry attempts and the timeout https://github.com/alephdata/ingest-file/blob/fca65fbb08ff37d65df3c14804ad5b1b6809b97d/ingestors/support/convert.py#L16-L17 What would be more desirable...

improvement

ingest-file extracts IBANs using a rather [simple regex](https://github.com/alephdata/ingest-file/blob/main/ingestors/analysis/patterns.py). This can lead to a lot of false positives. ingest-file could add additional validation for matches in order to improve precision: *...

improvement