ThreatExchange
ThreatExchange copied to clipboard
[hma] ValueError in indexer
Indexer fails to build
[ERROR] ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (264538,) + inhomogeneous part.
Traceback (most recent call last):
File "/var/task/hmalib/lambdas/unified_indexer.py", line 132, in lambda_handler
index: SignalTypeIndex = signal_type.get_index_cls().build(merged_data)
File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/index.py", line 216, in build
ret.add_all(entries)
File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_index.py", line 75, in add_all
self.index.add(
File "/var/lang/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_faiss_matcher.py", line 240, in add
self.faiss_index.add_with_ids(numpy.array(vectors), numpy.array(i64_ids))
- https://github.com/facebook/ThreatExchange/blob/main/hasher-matcher-actioner/hmalib/lambdas/unified_indexer.py#L132
- https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_index.py#L75
- https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_faiss_matcher.py#L240
Likely there's a malformed PDQ hash in the inputs somehow. PDQ index was written before we figured out how to correctly use the faiss interface, so there might be some other issue.