ssmba
ssmba copied to clipboard
TypeError: Can't convert 'XXX' to PyBool
Hi,
I got one issue for this line in the utils.py
in this line.
next_len = len(tokenizer.encode(*next_sents))
When I got a list contains three elements, like the following example.
next_sents = ['Beatriz Haddad Maia played on 2 April 2012', 'in Ribeirão Preto, Brazil', 'on a hard surface.']
I got an error like that. It seems it cannot handle the last element.
Traceback (most recent call last):
File "/home/qbao775/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2028, in encode
encoded_inputs = self.encode_plus(
File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2344, in encode_plus
return self._encode_plus(
File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 458, in _encode_plus
batched_output = self._batch_encode_plus(
File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 385, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
TypeError: Can't convert 'on a hard surface.' to PyBool
Does anyone know how to solve that issue? Thank you so much.
Unfortunately huggingface currently only supports sentence pair encoding. Introducing support for >3 sentences may require some workarounds to get by the huggingface api as it is. I'll try looking into this soon.