coderchem issues

Results 9 issues of


                                            coderchem

TP=2， Loss of accuracy

hello,I carried out TP=2,multi-gpu operation on llama model of 7b, and found that the comparison accuracy of the result was lost by 5%. TP=2 as far as I know should...

Limit cuda memory growth

hi,When I deployed the LLama model of 7B, I found that Cuda memory has been growing without limit on the A40. I wonder if FasterTransformer has any means to limit...

Cannot use huggface to load

I cut 25% of all the layers, but the cut shape is not I wanne, I hope the shape is [N,N] ,but [N,M] ,the M=N*0.25. it's difficult to load.

在将部分层进行剪枝之后，不能直接通过tgi加载模型

在将部分层进行剪枝之后，不能直接通过tgi加载模型，落地难度大，有什么好的idea吗？

你好，我使用的是样例测试集，想跑通README. 但是发现，在训练的时候，会卡住，然后超时； [batch=23/3200]: Train time/batch: 22 Train time/sample: 198 Train time/batch_in_epoch: 6 Train time/sample_in_epoch: 54 Train time/token: 811008 Train time/token_in_epoch: 221184 Train metrics/train/cc_weight: 0.6700 Train metrics/train/github_weight: 0.0450 Train metrics/train/book_weight: 0.0450...

cuda-keyring_1.0-1_all.deb Not FOUND

build dockerfile, command :"curl -o /tmp/cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/$arch/cuda-keyring_1.0-1_all.deb " is not found, open the url is 404 the branch ：master docekerfile : dockerfile/Dockerfile.triton.trt_llm_backend

triaged

Question: make -j4 ggml for seamlessM4T , "ggml_backend" is undefined

你好，我现在想用ggml对seamlessM4T在GPU上运行。我的驱动版本是 470.103.01 显卡是A100. 我的操作如下： docker run -it nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash #进入容器 git clone https://github.com/facebookresearch/seamless_communication.git #下载项目 #防止下面报错 apt-get update apt-get install libsndfile1-dev mkdir -p build cd build; cmake\ -DGGML_CUBLAS=ON \ -DBUILD_SHARED_LIBS=On...

The "/health" is so slow when generating extra-long text。

### System Info tgi 2.0.2 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications...

coderchem