ThreatExchange icon indicating copy to clipboard operation
ThreatExchange copied to clipboard

[pdq] pdq_hasher error for B/W png

Open thedanielsun opened this issue 3 years ago • 1 comments

threatexchange hash photo https://i.redd.it/4shux9eu3mga1.png

  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/signal_base.py", line 195, in hash_from_file
    return cls.hash_from_bytes(file.read_bytes())
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/signal.py", line 77, in hash_from_bytes
    pdq_hash, quality = pdq_from_bytes(bytes_)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 33, in pdq_from_bytes
    return _pdq_from_numpy_array(np_array)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_hasher.py", line 37, in _pdq_from_numpy_array
    hash_vector, quality = pdqhash.compute(array)
  File "pdqhash/bindings.pyx", line 67, in pdqhash.bindings.compute
IndexError: index 2 is out of bounds for axis 2 with size 2 

Possible root cause:

i’m just looking at the ndarray size which result for the image threatexchange hash photo https://i.redd.it/4shux9eu3mga1.png

_check_dimension_and_expand_if_needed size
(2048, 2004, 2)
_pdq_from_numpy_array
(2048, 2004, 2)

Pillow version: 9.3.0 https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_hasher.py#L50-L56

maybe this not working as intended? ndim=3 here but maybe B/W conversion is not calculating dimension properly

thedanielsun avatar Feb 06 '23 21:02 thedanielsun

Thanks for the report Daniel! The logging output and links really help as well.

Dcallies avatar Feb 07 '23 18:02 Dcallies