Nguyen Nguyen Anh

Results 36 comments of Nguyen Nguyen Anh

I did likes this, and not very sure if this destroy the LLaMA-2 tokenizer or not !!! Please comment. ``` model_name = "/home/steve/data02/LLaMA/LLaMA-3/models/llama-3-8b-instruct/" from transformers import AutoTokenizer model_name = "/home/steve/data02/LLaMA/LLaMA-3/models/llama-3-8b-instruct/"...

> Hi all! This is not a `hf` bug. For any tokenizer that is in `transformers` and that you load using `AutoTokenizer.from_pretrained` you can add any token using `tokenizer.add_tokens(["token1", "token2",])`...

> > > I did likes this, and not very sure if this destroy the LLaMA-2 tokenizer or not !!! Please comment. > > > ``` > > > model_name...

FYI. In order to finetune further LlaMA-3 finetuned model, with this new extended tokenizer with proper LLaMA-3 format, you have to change the ChatFormat function as follows: ``` class ChatFormat:...

Something is WRONG. The decoding of PreTrainedTokenizerFast (which LLaMA-3 are using) decode weird output once you add that token to the vocab using .add_tokens(word) function. I use standard tokenizer from...

> Regarding the new added token, the "issue" is that you need to make sure you add the correct representation of the string: > > ```python > >>> from tokenizers...

> @thusinh1969 can you please explain your use-case a bit, if you are extending just a small number of vocab perhaps no, but if you are adding a language to...

Grruhhhh I have to redo everything from doulb-ebuild Docker and all ffor this... Hopefully it should work ! > the only way i have been consistently able to run the...

I am about to write advanced API to process multi-users (not UI though). Will revert.