Fangkai Jiao issues

Results 15 issues of


                                            Fangkai Jiao

Has the work of adding BERT done?

Hi, do you have finished the word of adding bert? Could you please share the results? Thank you very much!

The optimizer may have a bad performance on reading comprehension model?

Great thanks to your work! ![image](https://user-images.githubusercontent.com/16469472/53220243-752b1080-369e-11e9-93dc-ca3ee0018cbb.png) The line with orange color is baseline using Adam as optimizer and the line with blue color is the baseline using AdaBound. I think...

Would you release the data on google drive?

Indeed, google cloud is not provided to developers in China so that I can't run the script with defalut storage space..... Would you release other solution for download the data?...

Problem for downloading data of reddit

Hi, great thanks to your contribution! I try to use `python demo.py --data full` to download the reddit data. For I don't want to train the model now I didn't...

Performance of slqa and bidaf++

Hello. It seems that the performance of slqa is worse than bidaf(with self-attention?). I wanna know that are there some problems implementation slqa? In my own implementation, it seems also...

Some Improvement to Implement

There are several tricks to implement to improve the performance and they won't be added util Sep. 19 for some more important projects. I opened this issue to avoid forgetting...

enhancement

Compatibility with Deepspeed, Fairscale, or Torch zero-redundancy optimizer

Wonderful work! May I know the compatibility with ZeRO mechanism? E.g., Torch redundancy optimizer, deepspeed zero-1 to zero-3, and fairscale FSDP. Becaused I noticed that QLoRA relies on particularly implemented...

How to load LLaMA65B without enough GPU memory?

I know that qlora can help train LLaMA-65B with less than 40GB memory. But I find that the quantization process should be completed on GPU, which means that you should...

Seems not compatible with DeepSpeed (perhaps also FSDP)

Hi, appreciate to your awesome work! When I trying to introduce GaLore AdamW optimizer to Gemma training, it seems that it is not compatible with deepspeed with Zero stage as...

A question?

Hi I have question. How could I use this parser if I wanna challenge the **Dialog State Tracking** task? I see your code while searching on github. The task definition...