Yixin Liu
Yixin Liu
Hi, the 100 epoch is just a default number. On CNN/DM the model actually reaches the best performance within one epoch. On XSum, it reaches the best performance within 5...
Hi, thank you for your interest in our project! > Does 'the score of one summary' or 'the probability to generate one summary' mean the similarity between the output and...
Thanks a lot for the suggestion! I was wondering if you have tried this modification and observed the difference in memory usage? It would be really great if you could...
Hi, thank you for your interest in our work! You may finetune the BRIO Huggingface model using MLE. But to train the model using our method you will need to...
Hi, were you able to solve this problem? One thing you could check is if the two files you provided (--ref and --hyp) contain the same number of lines. In...
Hi, thank you for your interest in our work. I'd recommend several things: 1. Following CNNDM setting may not always be suitable depending on the dataset you are working on....
Thanks @Hannibal046 for the comment. Hi @HillZhang1999, I'm not very familiar with GEC but I think your observation makes sense. It's very critical to have **diverse** candidates to make sure...
Hi, > What is the GPU size that is required to train this model? We used RTX 3090 with 24G GPU memory. > I am currently using eight 32 GB...
> 我用两块24G 的3090Ti 是否可以进行模型的训练 The question is if two 3090Ti are enough for model training. Yes, you should be able to train the model using 2 GPUs. But you will...
Hi, thank you for your interest in our work. I wanted to note that this loss function is adapted from [MatchSum](https://github.com/maszhongming/MatchSum/blob/master/metrics.py#L30). For `TotalLoss`, they have an explanation _here is to...