ThreatExchange icon indicating copy to clipboard operation
ThreatExchange copied to clipboard

[hma] ValueError in indexer

Open Dcallies opened this issue 1 year ago • 0 comments

Indexer fails to build

[ERROR] ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (264538,) + inhomogeneous part.
Traceback (most recent call last):
  File "/var/task/hmalib/lambdas/unified_indexer.py", line 132, in lambda_handler
    index: SignalTypeIndex = signal_type.get_index_cls().build(merged_data)
  File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/index.py", line 216, in build
    ret.add_all(entries)
  File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_index.py", line 75, in add_all
    self.index.add(
  File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_faiss_matcher.py", line 240, in add
    self.faiss_index.add_with_ids(numpy.array(vectors), numpy.array(i64_ids))
  • https://github.com/facebook/ThreatExchange/blob/main/hasher-matcher-actioner/hmalib/lambdas/unified_indexer.py#L132
  • https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_index.py#L75
  • https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_faiss_matcher.py#L240

Likely there's a malformed PDQ hash in the inputs somehow. PDQ index was written before we figured out how to correctly use the faiss interface, so there might be some other issue.

Dcallies avatar Aug 16 '23 22:08 Dcallies