Hossam Amer issues

Results 8 issues of


                                            Hossam Amer

Adding length penalty in v5.0 of online_softmax_beamsearch_kernels

Hi there, I would like to multiply the sum of log probs by a length_penalty as applied to the most recent version of T. I am using an older version...

Beam Size Search is defined to be 2 * beam in Fairseq

Why in Fairseq sequence generator we have this: ``` # number of candidate hypos per step cand_size = 2 * beam_size # 2 x beam size in case half are...

question

needs triage

Revert the compressed vectors to gensim format

I am using this pre-trained model: `ft_cc.en.300_freqprune_50K_5K_pq_100.bin` That's my code: ```org_model_path = "../models/en/ft_cc.en.300_freqprune_50K_5K_pq_100.bin" ft_gensim = compress_fasttext.models.CompressedFastTextKeyedVectors.load(org_model_path) new_vocab = ft_gensim.key_to_index new_vectors = ft_gensim.vectors new_ngrams = ft_gensim.vectors_ngrams print(type(new_vectors)) # print(type(new_ngrams)) # new_vectors...

GPU Index Search on the default stream

I have a defualt cuda stream running thru multiple kernels. Is there any way to run the index-search in C++ on this default stream?

GPU

feature request

Cannot reproduce distillgpt2 LM Numbers using --knn

I am trying to build on your knn-transfomers [repo](https://github.com/neulab/knn-transformers/tree/master?tab=readme-ov-file). When I run the distill gpt with the given setup in the repo but with --knn flag, I get around 21.xx...

group_texts function: Why?

There is a data function called `group_texts`. I understand that this function concatenates the texts and creates blocks of text with specific block size. I wish to understand why you...

Training Tokens of Checkpoints for tinyllama_math_code

Tinyllama checkpoints are available here: https://huggingface.co/TinyLlama/TinyLlama_v1.1_math_code_checkpoints/settings But no information about the training steps, GPU hrs, or volume given. Can anybody shed light on we can identify this information given a...

Padding side for knn-transfomers

In this [link](https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side), they say that decoder architectures should have left padding. In the code repository, you do right padding at the input side. Can you explain the reason? Is...