picard icon indicating copy to clipboard operation
picard copied to clipboard

What if reducing the batch size?

Open ywen666 opened this issue 2 years ago • 9 comments

Hi,

Thanks for releasing this amazing code repo! The paper mentioned the batch size is 2k, leading to 410 gradient accumulation steps, which is a bit too slow to fine-tune. I wonder if the authors have tried reduce the batch size, such as 128 as suggested in the T5 paper? Does this degrade the performance a lot?

Thanks!

ywen666 avatar Mar 31 '22 21:03 ywen666

Hi @ywen666, Thanks! You can try with batch size around 32, and that should work as well.

tscholak avatar Apr 01 '22 00:04 tscholak

Hi thanks for the suggestion! I also share a question with the other issue where the evaluation is slow.

Running python seq2seq/run_seq2seq.py configs/eval.json takes 9 hours to complete on a 4-gpu (rtx 5000) node, 200s per iteration, which seems too slow.

If I disable picard, runnning python seq2seq/run_seq2seq.py configs/nopicard_eval.json, it is pretty fast, finished in 8mins, 1.6s per iteration.

I wonder what is the best way to check which part bottlenecks the inference speed?

ywen666 avatar Apr 01 '22 15:04 ywen666

There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time, I’ll look into this.

tscholak avatar Apr 01 '22 16:04 tscholak

You could help me out by telling me which input-output pairs take the longest to generate.

tscholak avatar Apr 01 '22 17:04 tscholak

I am looking into this but it will take some time for me to figure it out.

ywen666 avatar Apr 02 '22 16:04 ywen666

Hi, I am trying to time each example generation time. I found the generate method wrapper for the SpiderModel in https://github.com/ElementAI/picard/blob/main/seq2seq/utils/picard_model_wrapper.py

However, I couldn't find the code which makes use this generate method. I checked the trainer and SpiderTrainer, it seems the evaluate never used the generate method, either the evaluation_loop inside which is from huggingface model.

Could you please give a pointer on where the generate wrapper is used in the repo?

ywen666 avatar Apr 05 '22 00:04 ywen666

oh, generate is invoked in the Seq2SeqTrainer's prediction_step method.

ywen666 avatar Apr 07 '22 22:04 ywen666

Hi! Evaluated on the spider_realistic dataset, and as @tscholak asked logged the calculation time for each question. Here is the full list: https://docs.google.com/spreadsheets/d/1NGui5DPQU5SChHzXXzfYYNjP6HknbM-VcXXI77dbGNk/edit?usp=sharing And here is the question that was the slowest:

500: What is the msot common country for singer? (0 days 00:21:56.996459)

takacsg84 avatar Apr 27 '22 13:04 takacsg84

Thanks so much, this information will help me with the root cause analysis for the speed regression!

tscholak avatar Apr 27 '22 14:04 tscholak