aleph
aleph copied to clipboard
BUG: no common schema between package and pages
Describe the bug
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/followthemoney/proxy.py", line 420, in merge
self.schema = model.common_schema(self.schema, other.schema)
File "/usr/local/lib/python3.8/dist-packages/followthemoney/model.py", line 127, in common_schema
raise InvalidData(msg % (left, right))
followthemoney.exc.InvalidData: No common schema: <Schema('Pages')> and <Schema('Package')>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/servicelayer/worker.py", line 40, in handle_safe
self.handle(task)
File "/aleph/aleph/worker.py", line 109, in handle
self.dispatch_task(task)
File "/aleph/aleph/worker.py", line 103, in dispatch_task
handler(collection, task)
File "/aleph/aleph/worker.py", line 41, in op_index
index_many(task.stage, collection, sync=sync, **task.payload)
File "/aleph/aleph/logic/processing.py", line 26, in index_many
index_aggregator(collection, aggregator, entity_ids=entity_ids, sync=sync)
File "/aleph/aleph/logic/collections.py", line 130, in index_aggregator
entities_index.index_bulk(collection, _generate(), sync=sync)
File "/aleph/aleph/index/entities.py", line 168, in index_bulk
bulk_actions(entities, sync=sync)
File "/aleph/aleph/index/util.py", line 220, in bulk_actions
for _, details in stream:
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
for bulk_data, bulk_actions in _chunk_actions(
File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
for action, data in actions:
File "/aleph/aleph/index/entities.py", line 167, in <genexpr>
entities = (e for e in entities if e is not None)
File "/aleph/aleph/index/entities.py", line 166, in <genexpr>
entities = (format_proxy(p, collection) for p in entities)
File "/aleph/aleph/logic/collections.py", line 124, in _generate
for idx, proxy in enumerate(entities, 1):
File "/usr/local/lib/python3.8/dist-packages/ftmstore/dataset.py", line 142, in iterate
entity.merge(partial)
File "/usr/local/lib/python3.8/dist-packages/followthemoney/proxy.py", line 423, in merge
raise InvalidData(msg % (self.id, e))
followthemoney.exc.InvalidData: Cannot merge entities with id [readacted].[redacted]: No common schema: <Schema('Pages')> and <Schema('Package')>
To Reproduce
I have no idea
Expected behavior
The workers don't catastrophically fail when hitting an exception
Aleph version
3.13.0
@brassy-endomorph Could you provide an example of a file or entity that triggers this bug when ingested?
#2879 which is referenced here has been fixed by this PR and the fix is available in ingest-file
version 3.19.3
and beyond.
@brassy-endomorph can you provide us with a file or entity that triggers this bug? Does it still reproduce?
I can't because when we scrape a directory it dumps tens of thousands of files into the system, and the log messages don't tell us what file errored out.