Fangkai Jiao

Results 15 issues of Fangkai Jiao

Hi, do you have finished the word of adding bert? Could you please share the results? Thank you very much!

Great thanks to your work! ![image](https://user-images.githubusercontent.com/16469472/53220243-752b1080-369e-11e9-93dc-ca3ee0018cbb.png) The line with orange color is baseline using Adam as optimizer and the line with blue color is the baseline using AdaBound. I think...

Indeed, google cloud is not provided to developers in China so that I can't run the script with defalut storage space..... Would you release other solution for download the data?...

Hi, great thanks to your contribution! I try to use `python demo.py --data full` to download the reddit data. For I don't want to train the model now I didn't...

Hello. It seems that the performance of slqa is worse than bidaf(with self-attention?). I wanna know that are there some problems implementation slqa? In my own implementation, it seems also...

There are several tricks to implement to improve the performance and they won't be added util Sep. 19 for some more important projects. I opened this issue to avoid forgetting...

enhancement

Wonderful work! May I know the compatibility with ZeRO mechanism? E.g., Torch redundancy optimizer, deepspeed zero-1 to zero-3, and fairscale FSDP. Becaused I noticed that QLoRA relies on particularly implemented...

I know that qlora can help train LLaMA-65B with less than 40GB memory. But I find that the quantization process should be completed on GPU, which means that you should...

Hi, appreciate to your awesome work! When I trying to introduce GaLore AdamW optimizer to Gemma training, it seems that it is not compatible with deepspeed with Zero stage as...

Hi I have question. How could I use this parser if I wanna challenge the **Dialog State Tracking** task? I see your code while searching on github. The task definition...