Yao Fu
Yao Fu
Oh that is for testing the model on the Wikibio data to text generation task, and not included in the paper. If you need this part could you send me...
Yep, that's right. With the current code I think you can get it run. But during my test, the training was unstable and may corrupt in the second epoch(loss become...
@TobiasLee Thanks for helping to answer! Training it longer is indeed a quick answer. But the model may still suffer from repetition after proper convergence. A quick solution would be...
For further discussion about architectures that prevent repetition, and its influence on sentence quality, see: https://www.aclweb.org/anthology/N18-1017/
Oh I think this model currently may not fit for your general paraphrasing task because it's trained on MSCOCO and Quora, all quite domain-specific. The quickest way I think would...
Hi kasra-pak, Sorry for the late reply, you could use a pre-trained translation model locally like the ones in OpenNMT: https://opennmt.net/Models-py/
I'm so sorry that you are encountering the problems. I have received a few issues in the previous weeks but I'm stuck in China for a visa issue while my...
Hi, Thank you for pointing this out! Indeed very important clarification here! It is a bit of hard to tell how exactly RLHF influence GPT-4 performance on GSM8k, because the...
Sure that's on the TODO list. Yet the hope is unlikely to happen -- generally model's reasoning ability is very well correlated with the scale (given other things done correctly)....
Hi, We have updated a list of models including vicuna, FlanT5, InstructCodeT5 and so on and their numbers on a subset of datasets are shown in the updated table. Will...