RuntimeError: Already borrowed
We're using transformers (3.5.0) with a fast tokenizer (0.9.3) in production, but sometimes a RuntimeError with Already borrowed is raised (this might come from Rusts's borrowing mechanisms?). This happens actually quite often, but I'm not sure yet why and how to reproduce this.
However, this is where the error is raised:
https://github.com/huggingface/tokenizers/blob/598ce61229c789465966682687fa12a90ec58074/bindings/python/py_src/tokenizers/implementations/base_tokenizer.py#L107-L123
Well, that's really weird. Such an error originating into enable_truncation seems very unlikely, I'm confused. Having a way to reproduce this would be ideal, but otherwise, if you can provide us with a stack trace that would already be very helpful.
Here's the stack trace. The input for this is rather short (about 70 characters) and always the same (basically a health check), but I still could not reproduce it locally yet.
{
"error.culprit": "transformers.tokenization_utils_fast.set_truncation_and_padding",
"error.exception": {
"stacktrace": [
{
"filename": "transformers/tokenization_utils_base.py",
"line": {
"number": 2217,
"context": " return self.encode_plus("
},
"function": "__call__",
"module": "transformers.tokenization_utils_base",
"context": {
"pre": [" )", " else:"],
"post": [
" text=text,",
" text_pair=text_pair,"
]
},
"vars": {
"padding": false,
"is_split_into_words": true,
"is_batched": false,
"return_attention_mask": true,
"return_length": false,
"stride": 0,
"return_offsets_mapping": false,
"return_special_tokens_mask": "********",
"verbose": true,
"self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
"return_overflowing_tokens": "********",
"truncation": true,
"add_special_tokens": "********",
"max_length": 512
}
},
{
"filename": "transformers/tokenization_utils_base.py",
"line": {
"number": 2287,
"context": " return self._encode_plus("
},
"module": "transformers.tokenization_utils_base",
"function": "encode_plus",
"context": {
"pre": [" )", ""],
"post": [" text=text,", " text_pair=text_pair,"]
},
"vars": {
"padding": false,
"is_split_into_words": true,
"return_attention_mask": true,
"padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
"stride": 0,
"return_length": false,
"return_offsets_mapping": false,
"return_special_tokens_mask": "********",
"verbose": true,
"truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
"self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
"return_overflowing_tokens": "********",
"truncation": true,
"add_special_tokens": "********",
"max_length": 512
}
},
{
"filename": "transformers/tokenization_utils_fast.py",
"line": {
"number": 455,
"context": " batched_output = self._batch_encode_plus("
},
"module": "transformers.tokenization_utils_fast",
"function": "_encode_plus",
"context": {
"pre": [
"",
" batched_input = [(text, text_pair)] if text_pair else [text]"
],
"post": [
" batched_input,",
" is_split_into_words=is_split_into_words,"
]
},
"vars": {
"is_split_into_words": true,
"return_attention_mask": true,
"padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
"stride": 0,
"return_length": false,
"return_offsets_mapping": false,
"return_special_tokens_mask": "********",
"verbose": true,
"truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
"self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
"return_overflowing_tokens": "********",
"add_special_tokens": "********",
"max_length": 512
}
},
{
"filename": "transformers/tokenization_utils_fast.py",
"line": {
"number": 378,
"context": " self.set_truncation_and_padding("
},
"function": "_batch_encode_plus",
"module": "transformers.tokenization_utils_fast",
"context": {
"pre": [
"",
" # Set the truncation and padding strategy and restore the initial configuration"
],
"post": [
" padding_strategy=padding_strategy,",
" truncation_strategy=truncation_strategy,"
]
},
"vars": {
"is_split_into_words": true,
"return_attention_mask": true,
"padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
"return_length": false,
"stride": 0,
"return_offsets_mapping": false,
"return_special_tokens_mask": "********",
"verbose": true,
"truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
"self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
"return_overflowing_tokens": "********",
"max_length": 512,
"add_special_tokens": "********"
}
},
{
"exclude_from_grouping": false,
"library_frame": false,
"filename": "transformers/tokenization_utils_fast.py",
"abs_path": "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py",
"line": {
"number": 323,
"context": " self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)"
},
"module": "transformers.tokenization_utils_fast",
"function": "set_truncation_and_padding",
"context": {
"pre": [
" # Set truncation and padding on the backend tokenizer",
" if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE:"
],
"post": [
" else:",
" self._tokenizer.no_truncation()"
]
},
"vars": {
"self": "PreTrainedTokenizerFast(name_or_path='/opt/model', vocab_size=250002, model_max_len=512, is_fast=True, ...",
"padding_strategy": "<PaddingStrategy.DO_NOT_PAD: 'do_not_pad'>",
"stride": 0,
"truncation_strategy": "<TruncationStrategy.LONGEST_FIRST: 'longest_first'>",
"max_length": 512
}
}
],
"handled": false,
"module": "builtins",
"message": "RuntimeError: Already borrowed",
"type": "RuntimeError"
}
}
I've just realized that this happens in transformers and not in tokenizers. Should I move the issue to the other repository? :grin:
Thank you very much @severinsimmler, this is very helpful. We can keep the issue open here since it is mostly related to this project, no worries!
I was not able to reproduce it, but I have an idea of how this could happen. Are you using this tokenizer from multiple python threads? Can you share a bit more about the kind of production setup you have? (like using multiple threads or process, or async, or anything like that)
The application runs in a Docker container with gunicorn like:
$ gunicorn --workers 1 --threads 2 --worker-class gthread
Alright, that's what I feared. This is happening because you have a single tokenizer, that is used by 2 different threads. While the tokenizer is encoding (on one thread), if the other thread tries to modify it, this error happens because it cannot be modified while being used at the same time.
I think the easiest way to fix it, for now, will be to ensure you have an instance of the tokenizer for each thread.
We should be able to fix this in transformers by making sure we update the truncation/padding info only if necessary (cc @LysandreJik @thomwolf).
And we should also be able to improve this error to make it clearer on tokenizers.
Good discussion. But I don't quite understand why this truncation/padding info has to be global. It can be passed as a parameter so that each tokenize call will be threadsafe.
The error still exists in: transformers==4.3.2, tokenizers==0.10.1. I am using gunicorn (with threads) with flask and the error shows if parallel requests are made.
The problem does not exist in transformers==3.0.2, tokenizers==0.8.1.
Still there
This happens in TokenizerFast for me. Workaround is not using that.
Did you try not sharing the tokenizer among multiple threads ? (The easiest way to to load the tokenizer on each thread instead ?)
There are some implemented protection, but there is only so much that the lib can do against that.
How could I do that sharing ?
Instead of loading the tokenizer before the thread fork, load it afterwards.
If you use torch.Dataset for instance it means loading the tokenizer in Dataset.__init__, instead of passing it.
I am integrating it inside tf dataset. It's tf threading vs tokenizerfast threading issue. I think.
On Wed, 2 Jun, 2021, 12:48 pm Nicolas Patry, @.***> wrote:
Instead of loading the tokenizer before the thread fork, load it afterwards.
If you use torch.Dataset for instance it means loading the tokenizer in Dataset.init, instead of passing it.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-852802954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KHINNFMILDQJ6LNJELTQXLLTANCNFSM4T3KE4MA .
You can also disable threading in tokenizers altogether by using the env variable:
TOKENIZERS_PARALLELISM=0 before launching your program, that might help.
Tried that buddy. Same issue :(
Any simple script to reproduce maybe ?
Sure Narsil.
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
#### Dataset Pipeline
def create_tokenize(text):
text = text.numpy().decode()
inputs = tokenizer(text, add_special_tokens=True, padding=True, return_tensors='tf')
return [tf.squeeze(inputs['input_ids']), tf.squeeze(inputs['attention_mask'])]
def create_data_map_fn_train(item):
input_ids, input_mask = tf.py_function(create_tokenize,[ item['text']], [tf.int32,tf.int32])
result = {}
result['input_ids'] = input_ids
result['input_type_ids'] = tf.zeros_like(input_ids)
result['input_mask'] = input_mask
return result
texts = {'text': ['This is sentence 1',
'This is entence 2',
'This is sentence 3',
'This is sentence 4']}
train_ds = tf.data.Dataset.from_tensor_slices(texts)
train_dataset = train_ds.map(create_data_map_fn_train, num_parallel_calls =tf.data.experimental.AUTOTUNE)
for item in train_dataset:
print(item)
You're sharing the tokenizer across thread boundaries....
Move the tokenizer declaration within the create_tokenize and everything will work fine.
I'm not familiar enough with tensorflow, but there's probably another way to instantiate the tokenizer only once (per thread).
Thanks. It works for small data. The moment we increase the size of the data it fails.
I guess it's because you keep instantiating the tokenizer that way, there really should be a way to have it once per thread. Other options would be to batch encode the tokens of your dataset first, THEN use it in a dataset (again, I'm not using TF enough to know from the top of my head the solution).
It is the right way to go about it nonetheless, and the error you are seeing is desirable in a way, because you don't want contention around a single tokenizer. There should be very little footprint to having it on every thread.
Could you try this:
from transformers import BertTokenizerFast
import tensorflow as tf
#### Dataset Pipeline
TOKENIZER = None
def get_tokenizer():
global TOKENIZER
if TOKENIZER is None:
TOKENIZER = BertTokenizerFast.from_pretrained("bert-base-uncased")
return TOKENIZER
def create_tokenize(text):
tokenizer = get_tokenizer()
text = text.numpy().decode()
inputs = tokenizer(text, add_special_tokens=True, padding=True, return_tensors='tf')
return [tf.squeeze(inputs['input_ids']), tf.squeeze(inputs['attention_mask'])]
def create_data_map_fn_train(item):
input_ids, input_mask = tf.py_function(create_tokenize,[ item['text']], [tf.int32,tf.int32])
result = {}
result['input_ids'] = input_ids
result['input_type_ids'] = tf.zeros_like(input_ids)
result['input_mask'] = input_mask
return result
texts = {'text': ['This is sentence 1',
'This is entence 2',
'This is sentence 3',
'This is sentence 4']}
train_ds = tf.data.Dataset.from_tensor_slices(texts)
train_dataset = train_ds.map(create_data_map_fn_train, num_parallel_calls =tf.data.experimental.AUTOTUNE)
for item in train_dataset:
print(item)
It's a dirty hack but it should work as TOKENIZER, will be global but only set after the fork, so it'll end up being thread specific variable.
I can understand your effort. Its failing.
I think TF has some crazy stuffs going inside.
Failing when we have larger data. But I kind of solved it using tf.text . And its so fast.
do you mind sharing for other users maybe ?
I will share it in few days. Its messy and its useful only for TF users, which I find is very minimal these days.
Hi, I have the same problem with gunicorn. For some models, it does work but for others it fails. I notice a difference between the 2 models:
This fails:
self.token_indexer.encode(x, max_length=350, truncation=True)
This seems to work:
self.token_indexer.encode(x, truncation=True)
The tokenizer is loaded at startup in guinicorn. When I receive a request, I try to tokenize the batch of text (probably in an another thread).
Is it because the set_truncation_and_padding function tries to modify the backend tokenizer (self._tokenizer) which is already owned by the first thread? In the second case (which work) the _tokenizer is not modified because max_length is at default.
Could we pass this as an argument of the backend encoding function instead of modifying the backend tokenizer object?
Is using directly _tokenizer on your part possible ? (don't call tokenizer.encode anymore)
transformers need to maintain backward compatibility and is unlikely to change any of its API.
tokenizers is a standalone project so it probably won't make decisions just to accommodate transformers (except very specific cases)
It seems like a threading issue.
Side note. tf-text is much faster than tokenizer (normal).
It's faster than tokenizer fast version by some extent.
tf-text : 6 seconds on 37000 text 512 length. tokenizer normal: 6 minutes on 37000 text 512 length tokenizer fast: 1 minute on 37000 text 512 length.
On Thu, 10 Jun, 2021, 2:21 pm Nicolas Patry, @.***> wrote:
Is using directly _tokenizer on your part possible ? (don't call tokenizer.encode anymore)
transformers need to maintain backward compatibility and is unlikely to change any of its API. tokenizers is a standalone project so it probably won't make decisions just to accommodate transformers (except very specific cases)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-858440105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KFMQ7PQYNDK2GN5MC3TSB4JZANCNFSM4T3KE4MA .
Does it do the same thing ?
From the docs, it seems to be a simple whitespace split, not really a BPE or Unigram tokenizer: https://www.tensorflow.org/tutorials/tensorflow_text/intro If this is the case, then it's perfectly normal. Raw python code might even be faster than tf.text still. Anything I'm missing ?
Yeah.
tf.text has BertTokenizer. It's whitespace + wordpiece . In general tf.text is faster. But problem is GPT2 and Roberta needs custom tokenizer.
And tf.text is required only if we want to make use of tf.data.Dataset , to prepare data on the fly.
To be frank, preprocess on the fly is something everyone is ignoring.
On Thu, 10 Jun, 2021, 6:52 pm Nicolas Patry, @.***> wrote:
Does it do the same thing ?
From the docs, it seems to be a simple whitespace split, not really a BPE or Unigram tokenizer: https://www.tensorflow.org/tutorials/tensorflow_text/intro If this is the case, then it's perfectly normal. Raw python code might even be faster than tf.text still. Anything I'm missing ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/huggingface/tokenizers/issues/537#issuecomment-858618592, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6KHTV4BRPFZ2QR5RSLLTSC4BVANCNFSM4T3KE4MA .