PLBART icon indicating copy to clipboard operation
PLBART copied to clipboard

Support for FastTokenizer in huggingface

Open zhipeng-cai opened this issue 1 year ago • 1 comments

Hello, I found there is no a corresponding PLBartTokenizerFast in huggingface, do you have a plan to implement a fast version tokenizer?

In fact, I need to call the word_ids() function of fast tokenizer to get the list indicating the original word corresponding to each tokenized token. word_ids = tokenized_inputs.word_ids(batch_index=i)

Or do you have any ways to calculate the original word index corresponding to each tokenized token?

Thank you very much!

zhipeng-cai avatar Apr 20 '23 03:04 zhipeng-cai