ingest-file icon indicating copy to clipboard operation
ingest-file copied to clipboard

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.

Results 91 ingest-file issues
Sort by recently updated
recently updated
newest added

There are two formats of PDF metadata: - the old one (a key-value dict) - XMP (introduced by Adobe in early 00's) These days, XMP must always be used instead...

Bumps [coverage](https://github.com/nedbat/coveragepy) from 6.4.4 to 6.5.0. Changelog Sourced from coverage's changelog. Version 6.5.0 — 2022-09-29 The JSON report now includes details of which branches were taken, and which are missing...

dependencies
python

Bumps [servicelayer[amazon,google]](https://github.com/alephdata/servicelayer) from 1.20.4 to 1.20.5. Commits ebbd6ed Bump version: 1.20.4 → 1.20.5 162e04d Update structlog requirement from <22.0.0,>=20.2.0 to >=20.2.0,<23.0.0 (#70) 3b944f9 Bump pika from 1.2.0 to 1.3.0 (#68)...

dependencies
python

Bumps ubuntu from 20.04 to 22.04. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ubuntu&package-manager=docker&previous-version=20.04&new-version=22.04)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...

dependencies
docker

when ingest-file is run on a machine with a large number of cores, the default pool size of SQLAlchemy may not be enough. See https://github.com/alephdata/ingest-file/issues/251

improvement

Error while analysing an ingested document stops the document processing pipeline and the document doesn't get indexed or show up on Aleph. Example of such an error: ``` Traceback (most...

bug

Replace chardet encoding checks for the newer stuff

### What is an ftm-bundle? An `ftm-bundle` is a zip file containing structured FtM entities and document blobs. The structure of the zip file may look something like: ``` bundle.zip/...

feature

nosetests is dead, and probably the better fixture handling in pytest will be an overall gain for the ingestors.

Ingestors don't currently support HEIC / HEVC images. refs https://github.com/alephdata/aleph/issues/1982