Hossam Amer
Hossam Amer
Hi there, I would like to multiply the sum of log probs by a length_penalty as applied to the most recent version of T. I am using an older version...
Why in Fairseq sequence generator we have this: ``` # number of candidate hypos per step cand_size = 2 * beam_size # 2 x beam size in case half are...
I am using this pre-trained model: `ft_cc.en.300_freqprune_50K_5K_pq_100.bin` That's my code: ```org_model_path = "../models/en/ft_cc.en.300_freqprune_50K_5K_pq_100.bin" ft_gensim = compress_fasttext.models.CompressedFastTextKeyedVectors.load(org_model_path) new_vocab = ft_gensim.key_to_index new_vectors = ft_gensim.vectors new_ngrams = ft_gensim.vectors_ngrams print(type(new_vectors)) # print(type(new_ngrams)) # new_vectors...
I have a defualt cuda stream running thru multiple kernels. Is there any way to run the index-search in C++ on this default stream?
I am trying to build on your knn-transfomers [repo](https://github.com/neulab/knn-transformers/tree/master?tab=readme-ov-file). When I run the distill gpt with the given setup in the repo but with --knn flag, I get around 21.xx...
There is a data function called `group_texts`. I understand that this function concatenates the texts and creates blocks of text with specific block size. I wish to understand why you...
Tinyllama checkpoints are available here: https://huggingface.co/TinyLlama/TinyLlama_v1.1_math_code_checkpoints/settings But no information about the training steps, GPU hrs, or volume given. Can anybody shed light on we can identify this information given a...
In this [link](https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side), they say that decoder architectures should have left padding. In the code repository, you do right padding at the input side. Can you explain the reason? Is...