coderchem

Results 9 issues of coderchem

hello,I carried out TP=2,multi-gpu operation on llama model of 7b, and found that the comparison accuracy of the result was lost by 5%. TP=2 as far as I know should...

hi,When I deployed the LLama model of 7B, I found that Cuda memory has been growing without limit on the A40. I wonder if FasterTransformer has any means to limit...

I cut 25% of all the layers, but the cut shape is not I wanne, I hope the shape is [N,N] ,but [N,M] ,the M=N*0.25. it's difficult to load.

在将部分层进行剪枝之后,不能直接通过tgi加载模型,落地难度大,有什么好的idea吗?

你好,我使用的是样例测试集,想跑通README. 但是发现,在训练的时候,会卡住,然后超时; [batch=23/3200]: Train time/batch: 22 Train time/sample: 198 Train time/batch_in_epoch: 6 Train time/sample_in_epoch: 54 Train time/token: 811008 Train time/token_in_epoch: 221184 Train metrics/train/cc_weight: 0.6700 Train metrics/train/github_weight: 0.0450 Train metrics/train/book_weight: 0.0450...

build dockerfile, command :"curl -o /tmp/cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/$arch/cuda-keyring_1.0-1_all.deb " is not found, open the url is 404 the branch :master docekerfile : dockerfile/Dockerfile.triton.trt_llm_backend

triaged

你好,我现在想用ggml对seamlessM4T在GPU上运行。 我的驱动版本是 470.103.01 显卡是A100. 我的操作如下: docker run -it nvcr.io/nvidia/pytorch:23.10-py3 /bin/bash #进入容器 git clone https://github.com/facebookresearch/seamless_communication.git #下载项目 #防止下面报错 apt-get update apt-get install libsndfile1-dev mkdir -p build cd build; cmake\ -DGGML_CUBLAS=ON \ -DBUILD_SHARED_LIBS=On...

### System Info tgi 2.0.2 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications...