Badis issues

Results 14 issues of


                                            Badis

Different outputs for differents numbers of threads (same seed)

Hello, I simply wanted to bring up the point that the output can vary based on the number of threads selected, even if the seed stays constant. I have an...

enhancement

The initial token is always empty.

Hello, I noticed something when trying the chat with Bob is that I always get the first token as empty. 1 -> '' 4103 -> ' Trans' 924 -> 'cript'...

need more info

generation quality

Converting alpaca-native-GPTQ models into ggml models

# Expected Behavior Hello, I wanted to convert the alpaca-native 7b GPTQ file (pt file) into a ggml file with the convert-gptq-to-ggml.py script https://github.com/ggerganov/llama.cpp/blob/master/convert-gptq-to-ggml.py # Current Behavior The problem is...

enhancement

model

Comparaison Windows Build VS Unix Build (through WSL2)

# Environment and Context Hello, Before jumping to the subject, here's the environnement I'm working with: - Windows 10 - Llama-13b-4bit-(GPTQ quantized) model - Intel® Core™ i7-10700K [AVX | AVX2...

Build your windows binaries with Clang and not MSVC.

Hello, Your [windows binaries releases](https://github.com/ggerganov/llama.cpp/releases) have probably been built with MSVC and I think there's a better way to do it. # Expected Behavior I have a Intel® Core™ i7-10700K...

How to activate BLAS?

Hello, I've heard that I could get BLAS activated through my intel i7 10700k by installing this [library](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html). Unfortunatly, nothing happened, after compiling again with Clung I still have no...

need more info

Can't choose groupsize 128 on the Model Tab

### Describe the bug Hello, I can't choose 128 in the groupsize parameter even though the model I try to load was quantized with groupsize 128 ![NEDtwuv](https://user-images.githubusercontent.com/110173477/232348551-1a99fc80-009f-41de-9aff-a4b76c23445b.png) That means I...

bug

The 4-bit monkey patch is causing a significant slowdown in the loading of the models.

### Describe the bug Hello, When I'm loading a model in the regular way it takes usually 50-60 seconds ![image](https://user-images.githubusercontent.com/110173477/232711867-75d3440b-0b51-4cc9-acdf-25e7acda922b.png) But when I load a model with the monkey patch...

bug

Can't run the triton models with the monkey patch

### Describe the bug Hello, I can't run my Triton models with the monkey patch for Loras ### Is there an existing issue for this? - [X] I have searched...

bug

The <eos> token randomly pops out during the inference, making the text generation stops early.

Hello, Several people have reported having this issue for the v1.1 model, which wasn't a bug for the v1.0 one. Do you plan on fixing that in the future?