picard
picard copied to clipboard
What if reducing the batch size?
Hi,
Thanks for releasing this amazing code repo! The paper mentioned the batch size is 2k, leading to 410 gradient accumulation steps, which is a bit too slow to fine-tune. I wonder if the authors have tried reduce the batch size, such as 128 as suggested in the T5 paper? Does this degrade the performance a lot?
Thanks!
Hi @ywen666, Thanks! You can try with batch size around 32, and that should work as well.
Hi thanks for the suggestion! I also share a question with the other issue where the evaluation is slow.
Running python seq2seq/run_seq2seq.py configs/eval.json takes 9 hours to complete on a 4-gpu (rtx 5000) node, 200s per iteration, which seems too slow.
If I disable picard, runnning python seq2seq/run_seq2seq.py configs/nopicard_eval.json, it is pretty fast, finished in 8mins, 1.6s per iteration.
I wonder what is the best way to check which part bottlenecks the inference speed?
There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time, I’ll look into this.
You could help me out by telling me which input-output pairs take the longest to generate.
I am looking into this but it will take some time for me to figure it out.
Hi, I am trying to time each example generation time. I found the generate method wrapper for the SpiderModel in https://github.com/ElementAI/picard/blob/main/seq2seq/utils/picard_model_wrapper.py
However, I couldn't find the code which makes use this generate method. I checked the trainer and SpiderTrainer, it seems the evaluate never used the generate method, either the evaluation_loop inside which is from huggingface model.
Could you please give a pointer on where the generate wrapper is used in the repo?
oh, generate is invoked in the Seq2SeqTrainer's prediction_step method.
Hi! Evaluated on the spider_realistic dataset, and as @tscholak asked logged the calculation time for each question. Here is the full list: https://docs.google.com/spreadsheets/d/1NGui5DPQU5SChHzXXzfYYNjP6HknbM-VcXXI77dbGNk/edit?usp=sharing And here is the question that was the slowest:
500: What is the msot common country for singer? (0 days 00:21:56.996459)
Thanks so much, this information will help me with the root cause analysis for the speed regression!