drxmy issues

Results 10 issues of


                                            drxmy

The Parent metric has different value when I am in bash and not

Same file and same command line, when I am in bash the F-score is only 0.15. When I am not in bash, the F-score is 0.47. Is there some reason...

The ChineseReverseDictionary data file is empty

The size fo ChineseReverseDictionary data provided in the google drive is 0. And the THU cloud link is empty. Any other way to download it? Thank you!

The example of bert compression did not change line 383 in modeling_bert.py?

https://github.com/microsoft/DeepSpeedExamples/blob/e7c8cb767acddba8ad5d2c41fe18e30de7870d30/model_compression/bert/huggingface_transformer/modeling_bert.py#L383 In example of model compression, it says only change is line 383 "where we output attention_scores instead of attention_prob.". But this line is the same as hugging face and...

is it possible to change the ui a little bit by myself?

**Description** I did not see anything related in doc. Since this project is using gradio, I guess this is possible. Does text-generation-webui have such feature? Or could you please tell...

enhancement

How to filter some fo the personalized data?

First, thank you for open souring the data. Like id=3 in zh_helpfulness or id=6 in zh_honesty, it has something like "我的创造者是复旦大学自然语言处理实验室和上海人工智能实验室". This is not good for training our own model....

Fintuning 176B Bloom with lora

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?...

Is there any script for pretraining/funting Bloom?

Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this [https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp](url) In my understanding, this script should be able to load bloom with some change, for example...

Wondering whether some of the triton or cuda kernel also speedup fp16 or not?

I am not familiar with triton or cuda. But it feels like some code(fused_attm) can also be used in fp16 to gain inference speedup compared with huggingface?

需要先对文本进行tokenize吗

您好，我想请教一下，我看官方开源的里面写着说数据要先tokenize，这个单指分词呢？还是分词后要进一步转化为数字？因为我看这个issue里面https://github.com/google-research/lasertagger/issues/11 ，还提到了detokenize。

Confused by config parameters among total_steps, epochs, batch_size and num_rollouts

### 📚 The doc issue TrainConfig has some general explanation about some of the parameters. But after running the [ppo_hh.py](https://github.com/CarperAI/trlx/blob/main/examples/hh/ppo_hh.py), i got confused. 1. ppo_hh.py set total_steps and epochs while...

documentation