credential-digger icon indicating copy to clipboard operation
credential-digger copied to clipboard

PasswordModel tokenizer error

Open marcorosa opened this issue 4 years ago • 2 comments

Sometimes, the scan fails due to a tokeniser error raised by the PasswordModel

For example (scanning repo https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise)

Exception in thread credentialdigger@https://github.com/wuest-amiconsult/BTP-Day2-Bookshop-Exercise:                                                              
Traceback (most recent call last):                                                                                                                                
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner                  
    self.run()                                                                                                                                                    
  File "/usr/local/Cellar/[email protected]/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 910, in run                               
    self._target(*self._args, **self._kwargs)                                                                                                                     
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 793, in scan    
    return self._scan(                                                                                                                                            
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1142, in _scan  
    self._analyze_discoveries(mm, password_discoveries, debug)                                                                                                    
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/client.py", line 1225, in _analyze_discoveries
    model_manager.launch_model_batch(discoveries)
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/model_manager.py", line 66, in launch_model_batch
    return self.model.analyze_batch(discoveries)
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 50, in analyze_batch
    data = self._pre_process([d['snippet'] for d in new_discoveries])
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/credentialdigger-4.5.0-py3.9.egg/credentialdigger/models/password_model.py", line 105, in _pre_process
    encodings = self.tokenizer(snippet,
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2404, in __call__
    return self.batch_encode_plus(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2589, in batch_encode_plus
    return self._batch_encode_plus(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 720, in _batch_encode_plus
    batch_outputs = self._batch_prepare_for_model(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 792, in _batch_prepare_for_model
    batch_outputs = self.pad(
  File "/Users/marco/git/credential-digger/venv3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2714, in pad
    raise ValueError(
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided []

marcorosa avatar Nov 02 '21 10:11 marcorosa

Fix released in #228

marcorosa avatar Apr 12 '22 09:04 marcorosa

This error raised again, so it was not properly fixed

marcorosa avatar Apr 12 '22 13:04 marcorosa