Kerem Turgutlu

Results 21 comments of Kerem Turgutlu

In [WordPiece](https://github.com/huggingface/course/blob/main/chapters/en/chapter6/6.mdx#implementing-wordpiece) if you go to line where we train the tokenizer and print the learned vocab: ``` print(vocab) ``` vocab from this print statement is missing the merge `ab`...

Same typo `Course -> course` is also present in [Unigram](https://github.com/huggingface/course/blob/main/chapters/en/chapter6/7.mdx). Final tokenizations assumes capital `Course` is used and results in ```['▁This', '▁is', '▁the', '▁Hugging', '▁Face', '▁', 'c', 'ou', 'r', 's',...

@lewtun created https://github.com/huggingface/course/pull/166

Thanks @leigh-plt, I modified batch loading seems to be ok in notebook environment of Kaggle after `dl_pack` trick: ``` def get_decord_video_batch(fname, sz, freq=10): "get batch tensor for inference, original for...

Thanks @leigh-plt this might be helpful for inference in the kernel!

Thanks for the suggestion, I will update the PR once I have time! > before each of the two cells that calculate MI, because as far as I can see...

@ArthurZucker I am facing a similar issue with openllama ```python save_dir = "../open_llama_7b_preview_300bt/open_llama_7b_preview_300bt_transformers_weights/" tokenizer = AutoTokenizer.from_pretrained(save_dir) tokenizer.bos_token_id ``` calling `tokenizer.bos_token_id` this causes max recursion depth error. ```python tokenizer LlamaTokenizerFast(name_or_path='../open_llama_7b_preview_300bt/open_llama_7b_preview_300bt_transformers_weights/', vocab_size=32000,...

You can [torrent](https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee/tech&filelist=1) it.

Can you share the training command you used with full arguments, and also provide versions of the following libraries: ``` accelerate bitsandbytes datasets hqq hqq-aten huggingface-hub llama-recipes peft safetensors tokenizers...

Could you provide more context about your training script? A link to the script with library versions and the expected error message would be helpful for debugging.