Jay S

Results 14 comments of Jay S

I am getting the same problem, and it doesn't even work with TF 0.12.1 as the code apparently seems to use TF1.0 API

@Narsil I'm using the official LLaMa checkpoints that I converted using the official script. The docker image I'm using is `https://ghcr.io/huggingface/text-generation-inference:latest` and here's the environment I used for converting the...

Interesting news. When I use a custom kernel (one that I built on my own instead of using the `ghcr` ones) I get the same sanity check error (which is...

Any updates on this issue? I'm encountering the same problem with 8 x A100 (80GB) for Lora 70B

So we can work on V100 with flash attention models like Llama 2?

How did you change the dtype to float32?

I know when running this with the official docker, it works well. However, I'm at an environment where running an unauthorized dockerfile is difficult (almost impossible).

I'm using a MiG system and CUDA version is 11.7 so that may be related. However, I found someone with a similar issue at #306

I tried with a slightly different system (not MiG) and encounter same issue. ``` 2023-06-08T06:08:52.395479Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3 Docker label: N/A...

Here's the results of calling some test code. Is it possible that I didn't build the project properly? > `make python-server-tests` ``` HF_HUB_ENABLE_HF_TRANSFER=1 pytest -s -vv -m "not private" server/tests...