Shashank Sonkar

Results 11 issues of Shashank Sonkar

Downloading the dataset from the website comprise different filenames, none of which matches this particular filename. Can you please elaborate as to how this file is created - like merging...

Running the code with n_head set to 1 leads to PPL of 6.65 (other parameters are same as that in readme). The resulting log is attached below. I'm surprised by...

Hi, I'm sorry if this has been asked ! I tried to search in the closed issues, but did not find a related question. Can I please get access to...

hey, I read the paper suggested. can you please explain why early stopping algorithm should have access to test dataset? I understand that the final filtered metrics can have access...

In [readme](https://github.com/ramsrigouthamg/Questgen.ai/tree/master/NewModels/T5LargeParaphraser), a diversity penalty is used. Can you please explain it and how the model is trained?

For MNLI, on the blog https://huggingface.co/blog/how_many_data_points/ - reported accuracy is 0.83 for 1000 data samples. In the paper (https://arxiv.org/pdf/2001.07676.pdf), (table 1), for MNLI, accuracy reported is 0.85 for 1000 data...

`CUDA_VISIBLE_DEVICES=6 python train.py --model_name_or_path bert-base-uncased --generator_name distilbert-base-uncased --train_file data/nli_for_simcse.csv --num_train_epochs 2 --per_device_train_batch_size 64 --learning_rate 2e-6 --max_seq_length 32 --evaluation_strategy steps --metric_for_best_model stsb_spearman --load_best_model_at_end --eval_steps 125 --pooler_type cls --overwrite_output_dir --logging_first_step --logging_dir trained...

Hello, the card for https://huggingface.co/CompVis/stable-diffusion-v1-4 says it does 10% drop of text-conditioning? What does that mean?

I was going through your paper and was curious as to how the scores are calculated. Is it pearson r * 100?

I was wondering if it's possible to freeze some layers in vicuna models, and if that will have smaller memory footprint to fine-tune the model?