blog
blog copied to clipboard
Public repo for HF blog posts
Fixed two broken links that were leading to 404 error pages.
history: [['You are World renown expert on quantum mechanics and the Bell inequality. Do you understand? ', '']] Exception in thread Thread-10 (generate_and_signal_complete): Traceback (most recent call last): File "/home/developer/mambaforge/envs/Guanaco/lib/python3.10/threading.py",...
Just curious when will QLoRA support quantization of new model?
Following the instructions in the blog post for assisted generation, I run into some issues. (FYI, both the longform_model and assistant_model are finetuned versions of OPT, which is the exact...
How I can reduce time for more the 100 token. ? The model take 1 minutes for 100 token using model in 4bit quantization.
accelerate 0.19.0 gym 0.21.0 huggingface-hub 0.14.1 numpy 1.24.3 packaging 23.1 pandas 2.0.1 transformers 4.29.2 Platform:I have tried in both Linux and Windows Python version: 3.8.10 I am trying to execute...
Hi, In the `bitsandbytes` [integration blog](https://github.com/huggingface/blog/blob/main/hf-bitsandbytes-integration.md), it says one could retrieve the FP16 weights via ``` (int8_model[0].weight.CB * int8_model[0].weight.SCB) / 127 ``` However, this is incorrect. In the case of...
Hello! While trying to run the model on my own dataset, I tried to run the source code on Google Colab as well as my local machine and in both...
Can we use OpenLLaMA weights as a base for StackLLaMA? Are they compatible, require conversion, or not compatible?