Nguyen Nguyen Anh comments

Results 36 comments of


Nguyen Nguyen Anh

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

Any help please...!

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

I did likes this, and not very sure if this destroy the LLaMA-2 tokenizer or not !!! Please comment. ``` model_name = "/home/steve/data02/LLaMA/LLaMA-3/models/llama-3-8b-instruct/" from transformers import AutoTokenizer model_name = "/home/steve/data02/LLaMA/LLaMA-3/models/llama-3-8b-instruct/"...

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

> Hi all! This is not a `hf` bug. For any tokenizer that is in `transformers` and that you load using `AutoTokenizer.from_pretrained` you can add any token using `tokenizer.add_tokens(["token1", "token2",])`...

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

> > > I did likes this, and not very sure if this destroy the LLaMA-2 tokenizer or not !!! Please comment. > > > ``` > > > model_name...

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

FYI. In order to finetune further LlaMA-3 finetuned model, with this new extended tokenizer with proper LLaMA-3 format, you have to change the ChatFormat function as follows: ``` class ChatFormat:...

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

Something is WRONG. The decoding of PreTrainedTokenizerFast (which LLaMA-3 are using) decode weird output once you add that token to the vocab using .add_tokens(word) function. I use standard tokenizer from...

I can not extend vocab of LLaMA-3 using sentencepiece anymore vs LLaMA-2 ?!?

> Regarding the new added token, the "issue" is that you need to make sure you add the correct representation of the string: > > ```python > >>> from tokenizers...

A big question: should I re-pretrain after extending vocab with LLaMA-3 pretrained weight or finetuned weight ?

> @thusinh1969 can you please explain your use-case a bit, if you are extending just a small number of vocab perhaps no, but if you are adding a language to...

Error when running through examples: "When passing variant='fp16' upgrade `transformers` to at least 4.27.0.dev0"

Grruhhhh I have to redo everything from doulb-ebuild Docker and all ffor this... Hopefully it should work ! > the only way i have been consistently able to run the...

one server, multiple sessions (users) ((feature request))

I am about to write advanced API to process multi-users (not UI though). Will revert.