tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

RuntimeError: Already borrowed

Open severinsimmler opened this issue 5 years ago • 46 comments

We're using transformers (3.5.0) with a fast tokenizer (0.9.3) in production, but sometimes a RuntimeError with Already borrowed is raised (this might come from Rusts's borrowing mechanisms?). This happens actually quite often, but I'm not sure yet why and how to reproduce this.

However, this is where the error is raised:

https://github.com/huggingface/tokenizers/blob/598ce61229c789465966682687fa12a90ec58074/bindings/python/py_src/tokenizers/implementations/base_tokenizer.py#L107-L123

severinsimmler avatar Nov 19 '20 12:11 severinsimmler

Well, that's really weird. Such an error originating into enable_truncation seems very unlikely, I'm confused. Having a way to reproduce this would be ideal, but otherwise, if you can provide us with a stack trace that would already be very helpful.

n1t0 avatar Nov 20 '20 01:11 n1t0

Here's the stack trace. The input for this is rather short (about 70 characters) and always the same (basically a health check), but I still could not reproduce it locally yet.

{
  "error.culprit": "transformers.tokenization_utils_fast.set_truncation_and_padding",
  "error.exception": {
    "stacktrace": [
      {
        "filename": "transformers/tokenization_utils_base.py",
        "line": {
          "number": 2217,
          "context": "            return self.encode_plus("
        },
        "function": "__call__",
        "module": "transformers.tokenization_utils_base",
        "context": {
          "pre": ["            )", "        else:"],
          "post": [
            "                text=text,",
            "                text_pair=text_pair,"
          ]
        },
        "vars": {
          "padding": false,
          "is_split_into_words": true,
          "is_batched": false,
          "return_attention_mask": true,
          "return_length": false,
          "stride": 0,
          "return_offsets_mapping": false,
          "return_special_tokens_mask": "********",
          "verbose": true,
          "self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
          "return_overflowing_tokens": "********",
          "truncation": true,
          "add_special_tokens": "********",
          "max_length": 512
        }
      },
      {
        "filename": "transformers/tokenization_utils_base.py",
        "line": {
          "number": 2287,
          "context": "        return self._encode_plus("
        },
        "module": "transformers.tokenization_utils_base",
        "function": "encode_plus",
        "context": {
          "pre": ["        )", ""],
          "post": ["            text=text,", "            text_pair=text_pair,"]
        },
        "vars": {
          "padding": false,
          "is_split_into_words": true,
          "return_attention_mask": true,
          "padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
          "stride": 0,
          "return_length": false,
          "return_offsets_mapping": false,
          "return_special_tokens_mask": "********",
          "verbose": true,
          "truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
          "self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
          "return_overflowing_tokens": "********",
          "truncation": true,
          "add_special_tokens": "********",
          "max_length": 512
        }
      },
      {
        "filename": "transformers/tokenization_utils_fast.py",
        "line": {
          "number": 455,
          "context": "        batched_output = self._batch_encode_plus("
        },
        "module": "transformers.tokenization_utils_fast",
        "function": "_encode_plus",
        "context": {
          "pre": [
            "",
            "        batched_input = [(text, text_pair)] if text_pair else [text]"
          ],
          "post": [
            "            batched_input,",
            "            is_split_into_words=is_split_into_words,"
          ]
        },
        "vars": {
          "is_split_into_words": true,
          "return_attention_mask": true,
          "padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
          "stride": 0,
          "return_length": false,
          "return_offsets_mapping": false,
          "return_special_tokens_mask": "********",
          "verbose": true,
          "truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
          "self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
          "return_overflowing_tokens": "********",
          "add_special_tokens": "********",
          "max_length": 512
        }
      },
      {
        "filename": "transformers/tokenization_utils_fast.py",
        "line": {
          "number": 378,
          "context": "        self.set_truncation_and_padding("
        },
        "function": "_batch_encode_plus",
        "module": "transformers.tokenization_utils_fast",
        "context": {
          "pre": [
            "",
            "        # Set the truncation and padding strategy and restore the initial configuration"
          ],
          "post": [
            "            padding_strategy=padding_strategy,",
            "            truncation_strategy=truncation_strategy,"
          ]
        },
        "vars": {
          "is_split_into_words": true,
          "return_attention_mask": true,
          "padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
          "return_length": false,
          "stride": 0,
          "return_offsets_mapping": false,
          "return_special_tokens_mask": "********",
          "verbose": true,
          "truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
          "self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
          "return_overflowing_tokens": "********",
          "max_length": 512,
          "add_special_tokens": "********"
        }
      },
      {
        "exclude_from_grouping": false,
        "library_frame": false,
        "filename": "transformers/tokenization_utils_fast.py",
        "abs_path": "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py",
        "line": {
          "number": 323,
          "context": "            self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)"
        },
        "module": "transformers.tokenization_utils_fast",
        "function": "set_truncation_and_padding",
        "context": {
          "pre": [
            "        # Set truncation and padding on the backend tokenizer",
            "        if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE:"
          ],
          "post": [
            "        else:",
            "            self._tokenizer.no_truncation()"
          ]
        },
        "vars": {
          "self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
          "padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
          "stride": 0,
          "truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
          "max_length": 512
        }
      }
    ],
    "handled": false,
    "module": "builtins",
    "message": "RuntimeError: Already borrowed",
    "type": "RuntimeError"
  }
}

severinsimmler avatar Nov 23 '20 10:11 severinsimmler

I've just realized that this happens in transformers and not in tokenizers. Should I move the issue to the other repository? :grin:

severinsimmler avatar Nov 23 '20 10:11 severinsimmler

Thank you very much @severinsimmler, this is very helpful. We can keep the issue open here since it is mostly related to this project, no worries!

I was not able to reproduce it, but I have an idea of how this could happen. Are you using this tokenizer from multiple python threads? Can you share a bit more about the kind of production setup you have? (like using multiple threads or process, or async, or anything like that)

n1t0 avatar Nov 23 '20 17:11 n1t0

The application runs in a Docker container with gunicorn like:

$ gunicorn --workers 1 --threads 2 --worker-class gthread

severinsimmler avatar Nov 24 '20 08:11 severinsimmler

Alright, that's what I feared. This is happening because you have a single tokenizer, that is used by 2 different threads. While the tokenizer is encoding (on one thread), if the other thread tries to modify it, this error happens because it cannot be modified while being used at the same time.

I think the easiest way to fix it, for now, will be to ensure you have an instance of the tokenizer for each thread.

We should be able to fix this in transformers by making sure we update the truncation/padding info only if necessary (cc @LysandreJik @thomwolf). And we should also be able to improve this error to make it clearer on tokenizers.

n1t0 avatar Nov 24 '20 17:11 n1t0

Good discussion. But I don't quite understand why this truncation/padding info has to be global. It can be passed as a parameter so that each tokenize call will be threadsafe.

hankcs avatar Jan 06 '21 16:01 hankcs

The error still exists in: transformers==4.3.2, tokenizers==0.10.1. I am using gunicorn (with threads) with flask and the error shows if parallel requests are made.

The problem does not exist in transformers==3.0.2, tokenizers==0.8.1.

djstrong avatar Feb 12 '21 10:02 djstrong

Still there

s4sarath avatar Jun 01 '21 13:06 s4sarath

This happens in TokenizerFast for me. Workaround is not using that.

s4sarath avatar Jun 01 '21 13:06 s4sarath

Did you try not sharing the tokenizer among multiple threads ? (The easiest way to to load the tokenizer on each thread instead ?)

There are some implemented protection, but there is only so much that the lib can do against that.

Narsil avatar Jun 01 '21 14:06 Narsil

How could I do that sharing ?

s4sarath avatar Jun 02 '21 01:06 s4sarath

Instead of loading the tokenizer before the thread fork, load it afterwards.

If you use torch.Dataset for instance it means loading the tokenizer in Dataset.__init__, instead of passing it.

Narsil avatar Jun 02 '21 07:06 Narsil

I am integrating it inside tf dataset. It's tf threading vs tokenizerfast threading issue. I think.

On Wed, 2 Jun, 2021, 12:48 pm Nicolas Patry, @.***> wrote:

Instead of loading the tokenizer before the thread fork, load it afterwards.

If you use torch.Dataset for instance it means loading the tokenizer in Dataset.init, instead of passing it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-852802954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KHINNFMILDQJ6LNJELTQXLLTANCNFSM4T3KE4MA .

s4sarath avatar Jun 02 '21 07:06 s4sarath

You can also disable threading in tokenizers altogether by using the env variable:

TOKENIZERS_PARALLELISM=0 before launching your program, that might help.

Narsil avatar Jun 02 '21 07:06 Narsil

Tried that buddy. Same issue :(

s4sarath avatar Jun 02 '21 07:06 s4sarath

Any simple script to reproduce maybe ?

Narsil avatar Jun 02 '21 07:06 Narsil

Sure Narsil.

from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

#### Dataset Pipeline
def create_tokenize(text):
    text = text.numpy().decode()
    inputs = tokenizer(text, add_special_tokens=True, padding=True, return_tensors='tf')
    return [tf.squeeze(inputs['input_ids']), tf.squeeze(inputs['attention_mask'])]

def create_data_map_fn_train(item):
    input_ids, input_mask = tf.py_function(create_tokenize,[ item['text']], [tf.int32,tf.int32])
    result = {}
    result['input_ids']  = input_ids
    result['input_type_ids'] = tf.zeros_like(input_ids)
    result['input_mask']  = input_mask

    
    return result

texts = {'text': ['This is sentence 1', 
        'This is entence 2', 
        'This is sentence 3', 
        'This is sentence 4']}

train_ds  = tf.data.Dataset.from_tensor_slices(texts)
train_dataset = train_ds.map(create_data_map_fn_train, num_parallel_calls =tf.data.experimental.AUTOTUNE)

for item in train_dataset:
    print(item)

s4sarath avatar Jun 02 '21 10:06 s4sarath

You're sharing the tokenizer across thread boundaries....

Move the tokenizer declaration within the create_tokenize and everything will work fine.

I'm not familiar enough with tensorflow, but there's probably another way to instantiate the tokenizer only once (per thread).

Narsil avatar Jun 03 '21 12:06 Narsil

Thanks. It works for small data. The moment we increase the size of the data it fails.

s4sarath avatar Jun 03 '21 12:06 s4sarath

I guess it's because you keep instantiating the tokenizer that way, there really should be a way to have it once per thread. Other options would be to batch encode the tokens of your dataset first, THEN use it in a dataset (again, I'm not using TF enough to know from the top of my head the solution).

It is the right way to go about it nonetheless, and the error you are seeing is desirable in a way, because you don't want contention around a single tokenizer. There should be very little footprint to having it on every thread.

Could you try this:

from transformers import BertTokenizerFast
import tensorflow as tf

#### Dataset Pipeline

TOKENIZER = None
def get_tokenizer():
    global TOKENIZER
    if TOKENIZER is None:
        TOKENIZER = BertTokenizerFast.from_pretrained("bert-base-uncased")
    return TOKENIZER


def create_tokenize(text):
    tokenizer = get_tokenizer()
    text = text.numpy().decode()
    inputs = tokenizer(text, add_special_tokens=True, padding=True, return_tensors='tf')
    return [tf.squeeze(inputs['input_ids']), tf.squeeze(inputs['attention_mask'])]

def create_data_map_fn_train(item):
    input_ids, input_mask = tf.py_function(create_tokenize,[ item['text']], [tf.int32,tf.int32])
    result = {}
    result['input_ids']  = input_ids
    result['input_type_ids'] = tf.zeros_like(input_ids)
    result['input_mask']  = input_mask

    
    return result

texts = {'text': ['This is sentence 1', 
        'This is entence 2', 
        'This is sentence 3', 
        'This is sentence 4']}

train_ds  = tf.data.Dataset.from_tensor_slices(texts)
train_dataset = train_ds.map(create_data_map_fn_train, num_parallel_calls =tf.data.experimental.AUTOTUNE)

for item in train_dataset:
    print(item)

It's a dirty hack but it should work as TOKENIZER, will be global but only set after the fork, so it'll end up being thread specific variable.

Narsil avatar Jun 03 '21 14:06 Narsil

I can understand your effort. Its failing.

I think TF has some crazy stuffs going inside.

s4sarath avatar Jun 03 '21 14:06 s4sarath

Failing when we have larger data. But I kind of solved it using tf.text . And its so fast.

s4sarath avatar Jun 03 '21 14:06 s4sarath

do you mind sharing for other users maybe ?

Narsil avatar Jun 04 '21 14:06 Narsil

I will share it in few days. Its messy and its useful only for TF users, which I find is very minimal these days.

s4sarath avatar Jun 09 '21 09:06 s4sarath

Hi, I have the same problem with gunicorn. For some models, it does work but for others it fails. I notice a difference between the 2 models:

This fails: self.token_indexer.encode(x, max_length=350, truncation=True) This seems to work:

self.token_indexer.encode(x, truncation=True)

The tokenizer is loaded at startup in guinicorn. When I receive a request, I try to tokenize the batch of text (probably in an another thread). Is it because the set_truncation_and_padding function tries to modify the backend tokenizer (self._tokenizer) which is already owned by the first thread? In the second case (which work) the _tokenizer is not modified because max_length is at default.

Could we pass this as an argument of the backend encoding function instead of modifying the backend tokenizer object?

gbmarc1 avatar Jun 09 '21 22:06 gbmarc1

Is using directly _tokenizer on your part possible ? (don't call tokenizer.encode anymore)

transformers need to maintain backward compatibility and is unlikely to change any of its API. tokenizers is a standalone project so it probably won't make decisions just to accommodate transformers (except very specific cases)

Narsil avatar Jun 10 '21 08:06 Narsil

It seems like a threading issue.

Side note. tf-text is much faster than tokenizer (normal).

It's faster than tokenizer fast version by some extent.

tf-text : 6 seconds on 37000 text 512 length. tokenizer normal: 6 minutes on 37000 text 512 length tokenizer fast: 1 minute on 37000 text 512 length.

On Thu, 10 Jun, 2021, 2:21 pm Nicolas Patry, @.***> wrote:

Is using directly _tokenizer on your part possible ? (don't call tokenizer.encode anymore)

transformers need to maintain backward compatibility and is unlikely to change any of its API. tokenizers is a standalone project so it probably won't make decisions just to accommodate transformers (except very specific cases)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-858440105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KFMQ7PQYNDK2GN5MC3TSB4JZANCNFSM4T3KE4MA .

s4sarath avatar Jun 10 '21 12:06 s4sarath

Does it do the same thing ?

From the docs, it seems to be a simple whitespace split, not really a BPE or Unigram tokenizer: https://www.tensorflow.org/tutorials/tensorflow_text/intro If this is the case, then it's perfectly normal. Raw python code might even be faster than tf.text still. Anything I'm missing ?

Narsil avatar Jun 10 '21 13:06 Narsil

Yeah.

tf.text has BertTokenizer. It's whitespace + wordpiece . In general tf.text is faster. But problem is GPT2 and Roberta needs custom tokenizer.

And tf.text is required only if we want to make use of tf.data.Dataset , to prepare data on the fly.

To be frank, preprocess on the fly is something everyone is ignoring.

On Thu, 10 Jun, 2021, 6:52 pm Nicolas Patry, @.***> wrote:

Does it do the same thing ?

From the docs, it seems to be a simple whitespace split, not really a BPE or Unigram tokenizer: https://www.tensorflow.org/tutorials/tensorflow_text/intro If this is the case, then it's perfectly normal. Raw python code might even be faster than tf.text still. Anything I'm missing ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-858618592, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KHTV4BRPFZ2QR5RSLLTSC4BVANCNFSM4T3KE4MA .

s4sarath avatar Jun 10 '21 14:06 s4sarath