Jay S
Jay S
I am getting the same problem, and it doesn't even work with TF 0.12.1 as the code apparently seems to use TF1.0 API
@Narsil I'm using the official LLaMa checkpoints that I converted using the official script. The docker image I'm using is `https://ghcr.io/huggingface/text-generation-inference:latest` and here's the environment I used for converting the...
Interesting news. When I use a custom kernel (one that I built on my own instead of using the `ghcr` ones) I get the same sanity check error (which is...
Any updates on this issue? I'm encountering the same problem with 8 x A100 (80GB) for Lora 70B
So we can work on V100 with flash attention models like Llama 2?
How did you change the dtype to float32?
I know when running this with the official docker, it works well. However, I'm at an environment where running an unauthorized dockerfile is difficult (almost impossible).
I'm using a MiG system and CUDA version is 11.7 so that may be related. However, I found someone with a similar issue at #306
I tried with a slightly different system (not MiG) and encounter same issue. ``` 2023-06-08T06:08:52.395479Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3 Docker label: N/A...
Here's the results of calling some test code. Is it possible that I didn't build the project properly? > `make python-server-tests` ``` HF_HUB_ENABLE_HF_TRANSFER=1 pytest -s -vv -m "not private" server/tests...