Zhiwei He comments

Results 17 comments of


                                            Zhiwei He

'TranslationFromPretrainedBARTTask' object has no attribute 'args'

same issue here

Thorough documentation

cant agree more

Discounted Cumulative Gain and other discounting metrics are missing from Surprise

> 我自己一直在使用此代码段：只是从sklearn.metrics调用nDCG： > > ``` > def get_ndcg(surprise_predictions, k_highest_scores=None): > """ > Calculates the ndcg (normalized discounted cumulative gain) from surprise predictions, using sklearn.metrics.ndcg_score and scipy.sparse > > Parameters: >...

Reproduction of machine translation results

Hi @cliang1453, I found that you define different param groups with different 'params_type' and 'weight_decay' [here](https://github.com/cliang1453/SAGE/blob/f4c6dc07cf66588dc1a8c0e84bd42311825cbfd9/mt_dnn/model.py#L97-L118). Did you do the same in the fairseq version?

Unable to use BLEURT in offline mode

Same here.

Any timelines about Kosmos-1?

Two months have passed... 😭

High training loss of LLaMA 13B

> gradient clipping @ZeyuTeng96 Hi. Gradient accumulation was used, and max_grad_norm defaults to 1. The following is the full configuration: ``` torchrun \ --nnodes=$HOST_NUM \ --nproc_per_node=$HOST_GPU_NUM \ --rdzv_id=$TJ_INSTANCE_ID \ --rdzv_backend=c10d...

Zhiwei He

'TranslationFromPretrainedBARTTask' object has no attribute 'args'

Thorough documentation

Discounted Cumulative Gain and other discounting metrics are missing from Surprise

Reproduction of machine translation results

Unable to use BLEURT in offline mode

Any timelines about Kosmos-1?

High training loss of LLaMA 13B

High training loss of LLaMA 13B

Freezes whenever I go to use the S or D key

Can I run this pipeline on A100-40GB?