shao-shuai comments

Results 20 comments of


                                            shao-shuai

model inference is pretty slow

> So this means no layers were put on gpu, but at least it recognized the gpu now. > > ```shell > llama_model_load_internal: offloading 0 repeating layers to GPU >...

model inference is pretty slow

> Yeah, it's not going to affect localGPT. But at least we know the underlying library works! You can try opening the text file and adding more layers, as long...

model inference is pretty slow

> If you haven't, can you try running this again? I believe the webui script runs on a separate conda env. > > ```shell > CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -U...

model inference is pretty slow

> Perhaps llama-cpp requires cuda 11 or 12, but I couldn't find that in their documentation and I wonder if it can be replicated. > > But, the out of...

model inference is pretty slow

> Reference in new issue Thanks, sorry to have a new error, :sweat: code ```Python if model_basename is not None: if ".ggml" in model_basename: logging.info("Using Llamacpp for GGML quantized models")...

model inference is pretty slow

> Hi @shao-shuai, were you able to resolve? sorry, caught by a flu, will let you know.

model inference is pretty slow

> pytorch nightly 12.1 I installed pytorch nightly 12.1 ```shell pip list | grep torch pytorch-triton 2.1.0+e6216047b8 torch 2.1.0.dev20230830+cu121 torchaudio 2.1.0.dev20230830+cu121 torchvision 0.16.0.dev20230830+cu121 ``` still got the mismatch error ```shell...