ThreatExchange
ThreatExchange copied to clipboard
[pdq] pdq_hasher error for B/W png
threatexchange hash photo https://i.redd.it/4shux9eu3mga1.png
File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/signal_base.py", line 195, in hash_from_file
return cls.hash_from_bytes(file.read_bytes())
File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/signal.py", line 77, in hash_from_bytes
pdq_hash, quality = pdq_from_bytes(bytes_)
File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 33, in pdq_from_bytes
return _pdq_from_numpy_array(np_array)
File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 37, in _pdq_from_numpy_array
hash_vector, quality = pdqhash.compute(array)
File "pdqhash/bindings.pyx", line 67, in pdqhash.bindings.compute
IndexError: index 2 is out of bounds for axis 2 with size 2
Possible root cause:
i’m just looking at the ndarray size which result for the image
threatexchange hash photo https://i.redd.it/4shux9eu3mga1.png
_check_dimension_and_expand_if_needed size
(2048, 2004, 2)
_pdq_from_numpy_array
(2048, 2004, 2)
Pillow version: 9.3.0 https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_hasher.py#L50-L56
maybe this not working as intended?
ndim=3 here but maybe B/W conversion is not calculating dimension properly
Thanks for the report Daniel! The logging output and links really help as well.