Zhiwei He

Results 17 comments of Zhiwei He

cant agree more

> 我自己一直在使用此代码段:只是从sklearn.metrics调用nDCG: > > ``` > def get_ndcg(surprise_predictions, k_highest_scores=None): > """ > Calculates the ndcg (normalized discounted cumulative gain) from surprise predictions, using sklearn.metrics.ndcg_score and scipy.sparse > > Parameters: >...

Hi @cliang1453, I found that you define different param groups with different 'params_type' and 'weight_decay' [here](https://github.com/cliang1453/SAGE/blob/f4c6dc07cf66588dc1a8c0e84bd42311825cbfd9/mt_dnn/model.py#L97-L118). Did you do the same in the fairseq version?

Two months have passed... 😭

> gradient clipping @ZeyuTeng96 Hi. Gradient accumulation was used, and max_grad_norm defaults to 1. The following is the full configuration: ``` torchrun \ --nnodes=$HOST_NUM \ --nproc_per_node=$HOST_GPU_NUM \ --rdzv_id=$TJ_INSTANCE_ID \ --rdzv_backend=c10d...

@ZeyuTeng96 @jyshee @zixiliuUSC Hi everyone, sorry for the late reply. According to @jyshee 's suggestion, I have successfully run the training of the 13B model. The following are all my...

Thanks and what is your RAW usage rate?