xqun3 issues

Results 8 issues of


                                            xqun3

How to do validation when training an NMT model use srun with multi Gpus

I finetuned the deltam use srun with multi gpu, the training script shown below which is modified from the demo ``` python train.py $data_bin \ --distributed-port 12345 \ --no-save --disable-validation...

question

needs triage

multi-gpus with max_batch_size > 1 with ddp

@lucidrains hi, when I train the imagen with multi-gpus， an warning occured “UserWarning: Grad strides do not match bucket view strides. ”. # warning ``` /home/tqd/anaconda3/envs/imagen_pyenv3/lib/python3.9/site-packages/torch/autograd/__init__.py:173: UserWarning: Grad strides do...

数据格式中的labels 和 golden_tgt 的区别，训练数据链接失效

你好，在看你代码的时候，有点不太明白数据中 labels 这个字段，我理解 golden_tgt 是真正的 summary，那 labels 字段是什么数据呢？同时网页中提供的 data 链接均失效，能否重新提供下呢？

增加对Cluade3的评测

很好的一项工作，最近 Anthropic 也放出了 Claude3 系列模型，能否增加对 Claude3 系列模型的评测结果呢？

enhancement

OS std 测试集结果

测试的os 结果文件中，几乎都没有“commit” 类别的结果，如果使用bash的能够正常执行结束作为回答正确的标准，很难保证能够是正确回答了原始的问题比如下面的情况 ![image](https://github.com/THUDM/AgentBench/assets/9492425/2e43ca02-6dc8-4a71-b5ce-aa5b61e057dc) ### 原始问题 As a student, you are given a directory named `log_files` containing log files from multiple servers. The log files are named as "server1.log", "server2.log",...

bug

help wanted

Support SageMaker Endpoint Message API

## Title Support SageMaker Endpoint Message API ## Type 🆕 New Feature

[REQUEST] Could not install deepspeed successfully in NVIDIA L40S instance

Could not install deepspeed successfully in NVIDIA L40S instance ## enviroment CUDA Version: 12.4 python version: 3.12.7 torch version: 2.4.0+cu124 ## install method pip install deepspeed ## Screen shot ![image](https://github.com/user-attachments/assets/d41ffb94-30bd-420a-9682-3d8df54b812e)

build

whisper 并发推理问题

Hi @yuekaizhang，感谢分享代码，很棒的工作！但是我在实际部署使用时发现一个问题，模型在部署以后，发起并发调用，并没有看到batch的效果，而是按照并发的大小推理时间成倍增加，是因为本身的实现并不支持triton组batch？我的batch相关配置如下： ``` dynamic_batching { preferred_batch_size: [ 4, 8] max_queue_delay_microseconds: 100 } ```