ssmba icon indicating copy to clipboard operation
ssmba copied to clipboard

TypeError: Can't convert 'XXX' to PyBool

Open 14H034160212 opened this issue 2 years ago • 1 comments

Hi,

I got one issue for this line in the utils.py in this line.

next_len = len(tokenizer.encode(*next_sents))

When I got a list contains three elements, like the following example.

next_sents = ['Beatriz Haddad Maia played on 2 April 2012', 'in Ribeirão Preto, Brazil', 'on a hard surface.']

I got an error like that. It seems it cannot handle the last element.

Traceback (most recent call last):
  File "/home/qbao775/.pycharm_helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2028, in encode
    encoded_inputs = self.encode_plus(
  File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2344, in encode_plus
    return self._encode_plus(
  File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 458, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/data/qbao775/ssmba/venv/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 385, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
TypeError: Can't convert 'on a hard surface.' to PyBool

Does anyone know how to solve that issue? Thank you so much.

14H034160212 avatar Feb 07 '22 00:02 14H034160212

Unfortunately huggingface currently only supports sentence pair encoding. Introducing support for >3 sentences may require some workarounds to get by the huggingface api as it is. I'll try looking into this soon.

nng555 avatar Feb 21 '22 19:02 nng555