drxmy
drxmy
Same file and same command line, when I am in bash the F-score is only 0.15. When I am not in bash, the F-score is 0.47. Is there some reason...
The size fo ChineseReverseDictionary data provided in the google drive is 0. And the THU cloud link is empty. Any other way to download it? Thank you!
https://github.com/microsoft/DeepSpeedExamples/blob/e7c8cb767acddba8ad5d2c41fe18e30de7870d30/model_compression/bert/huggingface_transformer/modeling_bert.py#L383 In example of model compression, it says only change is line 383 "where we output attention_scores instead of attention_prob.". But this line is the same as hugging face and...
**Description** I did not see anything related in doc. Since this project is using gradio, I guess this is possible. Does text-generation-webui have such feature? Or could you please tell...
First, thank you for open souring the data. Like id=3 in zh_helpfulness or id=6 in zh_honesty, it has something like "我的创造者是复旦大学自然语言处理实验室和上海人工智能实验室". This is not good for training our own model....
The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?...
Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this [https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp](url) In my understanding, this script should be able to load bloom with some change, for example...
I am not familiar with triton or cuda. But it feels like some code(fused_attm) can also be used in fp16 to gain inference speedup compared with huggingface?
您好,我想请教一下,我看官方开源的里面写着说数据要先tokenize,这个单指分词呢?还是分词后要进一步转化为数字?因为我看这个issue里面https://github.com/google-research/lasertagger/issues/11 ,还提到了detokenize。
### 📚 The doc issue TrainConfig has some general explanation about some of the parameters. But after running the [ppo_hh.py](https://github.com/CarperAI/trlx/blob/main/examples/hh/ppo_hh.py), i got confused. 1. ppo_hh.py set total_steps and epochs while...