aleph icon indicating copy to clipboard operation
aleph copied to clipboard

BUG: no common schema between package and pages

Open brassy-endomorph opened this issue 2 years ago • 3 comments

Describe the bug

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/followthemoney/proxy.py", line 420, in merge
    self.schema = model.common_schema(self.schema, other.schema)
  File "/usr/local/lib/python3.8/dist-packages/followthemoney/model.py", line 127, in common_schema
    raise InvalidData(msg % (left, right))
followthemoney.exc.InvalidData: No common schema: <Schema('Pages')> and <Schema('Package')>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/servicelayer/worker.py", line 40, in handle_safe
    self.handle(task)
  File "/aleph/aleph/worker.py", line 109, in handle
    self.dispatch_task(task)
  File "/aleph/aleph/worker.py", line 103, in dispatch_task
    handler(collection, task)
  File "/aleph/aleph/worker.py", line 41, in op_index
    index_many(task.stage, collection, sync=sync, **task.payload)
  File "/aleph/aleph/logic/processing.py", line 26, in index_many
    index_aggregator(collection, aggregator, entity_ids=entity_ids, sync=sync)
  File "/aleph/aleph/logic/collections.py", line 130, in index_aggregator
    entities_index.index_bulk(collection, _generate(), sync=sync)
  File "/aleph/aleph/index/entities.py", line 168, in index_bulk
    bulk_actions(entities, sync=sync)
  File "/aleph/aleph/index/util.py", line 220, in bulk_actions
    for _, details in stream:
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 319, in streaming_bulk
    for bulk_data, bulk_actions in _chunk_actions(
  File "/usr/local/lib/python3.8/dist-packages/elasticsearch/helpers/actions.py", line 155, in _chunk_actions
    for action, data in actions:
  File "/aleph/aleph/index/entities.py", line 167, in <genexpr>
    entities = (e for e in entities if e is not None)
  File "/aleph/aleph/index/entities.py", line 166, in <genexpr>
    entities = (format_proxy(p, collection) for p in entities)
  File "/aleph/aleph/logic/collections.py", line 124, in _generate
    for idx, proxy in enumerate(entities, 1):
  File "/usr/local/lib/python3.8/dist-packages/ftmstore/dataset.py", line 142, in iterate
    entity.merge(partial)
  File "/usr/local/lib/python3.8/dist-packages/followthemoney/proxy.py", line 423, in merge
    raise InvalidData(msg % (self.id, e))
followthemoney.exc.InvalidData: Cannot merge entities with id [readacted].[redacted]: No common schema: <Schema('Pages')> and <Schema('Package')>

To Reproduce

I have no idea

Expected behavior

The workers don't catastrophically fail when hitting an exception

Aleph version

3.13.0

brassy-endomorph avatar Feb 09 '23 21:02 brassy-endomorph

@brassy-endomorph Could you provide an example of a file or entity that triggers this bug when ingested?

tillprochaska avatar Feb 15 '23 09:02 tillprochaska

#2879 which is referenced here has been fixed by this PR and the fix is available in ingest-file version 3.19.3 and beyond.

@brassy-endomorph can you provide us with a file or entity that triggers this bug? Does it still reproduce?

catileptic avatar Feb 13 '24 13:02 catileptic

I can't because when we scrape a directory it dumps tens of thousands of files into the system, and the log messages don't tell us what file errored out.

brassy-endomorph avatar Feb 18 '24 07:02 brassy-endomorph