ingest-file icon indicating copy to clipboard operation
ingest-file copied to clipboard

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.

Results 91 ingest-file issues
Sort by recently updated
recently updated
newest added

ingest-file could extract crypto wallet addresses for popular crypto currencies using regular expressions, similar to it already extracts email addresses and IBANs. While ElasticSearch and Aleph do support searching using...

improvement

See https://github.com/alephdata/ingest-file/pull/511

DO NOT MERGE. This is a hack, for internal review only.

TODO: - [ ] `Mentions` seem to be missing, ingestigate As per alephdata/aleph#3908 and [#2066](https://github.com/alephdata/aleph/issues/2066), this is an attempt to create `BankAccount` FTM entities out of valid IBANs. In the...

Bumps ubuntu from 20.04 to 23.04. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ubuntu&package-manager=docker&previous-version=20.04&new-version=23.04)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands...

dependencies
docker

There are two tests currently marked as `@skip` in the `tests` dir: - [test_olm.py](https://github.com/alephdata/ingest-file/blob/main/tests/test_olm.py) - [test_djvu.py](https://github.com/alephdata/ingest-file/blob/main/tests/test_djvu.py) Both fail. The root cause for the failure should be investigated. Ideally, all tests...

python

`3.18.2` has difficulties with PDFs with unsupported image formats when we try to get a PIL image out of a pikepdf Image. Some research suggests this might be related to...

bug

A `ProcessingException` is thrown every time `ingest-file` isn't able to parse a file. In the current state, if Sentry support is enabled, each of these will create an event in...

While importing an e-mail-archive in the (IMHO cursed) .PST-format, I came across a mailbox having all `application/rtf` for body type. ``` Content-Type: application/rtf Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*=utf-8''rtf-body.rtf; filename="rtf-body.rtf" ```...

bug
moderate