s4sarath
s4sarath
Download the model from tensorflow hub. The downloaded models will have an ```assets``` folder. Inside that ```.vocab``` and ```.model``` is present. ```.model``` represents spm model. With no SPM Model ------------------...
@np-2019 - It is better not to use XLNET preprocessing. Here things are bit different. The provided code runs without any error. If you are familiar with BERT preprocessing, it...
@np-2019 - Thats pretty good results. Which Albert model ( large, xlarge and version (v1 or v2) ) you have used?
:-) they are doing that inside sample.py while loop. Their code is so efficient, they don't want to cache it altogether, while calculating logits with new predicted word. In sample.py...
Still there
This happens in TokenizerFast for me. Workaround is not using that.
How could I do that sharing ?
I am integrating it inside tf dataset. It's tf threading vs tokenizerfast threading issue. I think. On Wed, 2 Jun, 2021, 12:48 pm Nicolas Patry, ***@***.***> wrote: > Instead of...
Tried that buddy. Same issue :(
Sure Narsil. ``` from transformers import BertTokenizerFast tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") #### Dataset Pipeline def create_tokenize(text): text = text.numpy().decode() inputs = tokenizer(text, add_special_tokens=True, padding=True, return_tensors='tf') return [tf.squeeze(inputs['input_ids']), tf.squeeze(inputs['attention_mask'])] def create_data_map_fn_train(item): input_ids,...