xqun3

Results 8 issues of xqun3

I finetuned the deltam use srun with multi gpu, the training script shown below which is modified from the demo ``` python train.py $data_bin \ --distributed-port 12345 \ --no-save --disable-validation...

question
needs triage

@lucidrains hi, when I train the imagen with multi-gpus, an warning occured “UserWarning: Grad strides do not match bucket view strides. ”. # warning ``` /home/tqd/anaconda3/envs/imagen_pyenv3/lib/python3.9/site-packages/torch/autograd/__init__.py:173: UserWarning: Grad strides do...

你好,在看你代码的时候,有点不太明白数据中 labels 这个字段,我理解 golden_tgt 是真正的 summary,那 labels 字段是什么数据呢? 同时网页中提供的 data 链接均失效,能否重新提供下呢?

很好的一项工作,最近 Anthropic 也放出了 Claude3 系列模型,能否增加对 Claude3 系列模型的评测结果呢?

enhancement

测试的os 结果文件中,几乎都没有“commit” 类别的结果,如果使用bash的能够正常执行结束作为回答正确的标准,很难保证能够是正确回答了原始的问题比如下面的情况 ![image](https://github.com/THUDM/AgentBench/assets/9492425/2e43ca02-6dc8-4a71-b5ce-aa5b61e057dc) ### 原始问题 As a student, you are given a directory named `log_files` containing log files from multiple servers. The log files are named as "server1.log", "server2.log",...

bug
help wanted

## Title Support SageMaker Endpoint Message API ## Type 🆕 New Feature

Could not install deepspeed successfully in NVIDIA L40S instance ## enviroment CUDA Version: 12.4 python version: 3.12.7 torch version: 2.4.0+cu124 ## install method pip install deepspeed ## Screen shot ![image](https://github.com/user-attachments/assets/d41ffb94-30bd-420a-9682-3d8df54b812e)

build

Hi @yuekaizhang,感谢分享代码,很棒的工作! 但是我在实际部署使用时发现一个问题,模型在部署以后,发起并发调用,并没有看到batch的效果,而是按照并发的大小推理时间成倍增加,是因为本身的实现并不支持triton组batch?我的batch相关配置如下: ``` dynamic_batching { preferred_batch_size: [ 4, 8] max_queue_delay_microseconds: 100 } ```