Hossam Amer comments

Results 9 comments of


                                            Hossam Amer

Adding length penalty in v5.0 of online_softmax_beamsearch_kernels

If you can explain what the TopKMD is doing in the old code, that'd be greatly appreciated.

Adding length penalty in v5.0 of online_softmax_beamsearch_kernels

Thanks, @byshiue. Just one question, beam_online_softmax_topk_stage1_kernel depends on the vocab size parameter to move to the right memory location of log probs. Does beam_online_softmax_topk_stage2_kernelLauncher and/or batch_topk_kernel also depend on the...

Revert the compressed vectors to gensim format

I believe that should be the solution: ``` new_vocab = ft_gensim.key_to_index new_vectors = ft_gensim.vectors.unpack() new_ngrams = ft_gensim.vectors_ngrams.unpack() ``` That being said, this code increases the size of the original model...

Revert the compressed vectors to gensim format

My original goal is: (1) Take any language model from [here](https://fasttext.cc/docs/en/crawl-vectors.html) and compress this model down to 2-3 MBs using the `prune_ft_freq` (2) Use this model and implement the word/sentence...

Revert the compressed vectors to gensim format

> > implement the word/sentence look-up without external dependencies > > What do you mean by "without external dependencies"? You want to do the lookup in pure `numpy`, without `gensim`...

Revert the compressed vectors to gensim format

> > Of course, if you have pointers, that'd be great. For example, the hashing function is not clear in compress fastttext lookup. > > What kind of pointers do...

Cannot reproduce distillgpt2 LM Numbers using --knn

Thanks @urialon for getting back. The model that I was using in the previous (sorry I edited my post above) is the one given in the repo. That said, the...

Cannot reproduce distillgpt2 LM Numbers using --knn

Just want to update on the issue. Using the following did not result into the size issue: ``` MODEL=neulab/distilgpt2-finetuned-wikitext103 CUDA_VISIBLE_DEVICES=0 python -u run_clm.py \ --model_name_or_path ${MODEL} \ --dataset_name wikitext --dataset_config_name...

Cannot reproduce distillgpt2 LM Numbers using --knn

Hi Uri, I tried to construct the datastore with the wikitext validation set and given distill gpt model. Then run knn using the same set. The final perplexity scores are...