Daniel Han
Daniel Han
Hey MistralAI hackathon participants! If it's helpful, I uploaded 4bit pre-quantized bitsandbytes versions of the new 32K base model from Mistral to https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit - you get 1GB less VRAM usage...
https://twitter.com/hf_status/status/1782389421196644540 Will update if the issue is resolved
Colab for Llama-3 8b: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing
```python FSSPEC_VERSION = version.parse(importlib.metadata.version("fsspec")) File /opt/conda/lib/python3.10/site-packages/packaging/version.py:264, in Version.__init__(self, version) 261 def __init__(self, version: str) -> None: 262 263 # Validate the version and parse it into pieces --> 264 match...
```python /usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py in to_dict(self) 910 if hasattr(self, "quantization_config"): 911 output["quantization_config"] = ( --> 912 self.quantization_config.to_dict() 913 if not isinstance(self.quantization_config, dict) 914 else self.quantization_config AttributeError: 'NoneType' object has no attribute 'to_dict'...
"found a major bug: if you save the model both locally and to the hub after training it, then the second model that is saved will have the LoRA applied...
Reported to Hugging Face - they're working on a fix EDIT - they fixed it!!
Findings from https://github.com/ggerganov/llama.cpp/issues/7062 and Discord chats: Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing 1. Unsloth + float16 + QLoRA = WORKS 2. Unsloth + bfloat16 + QLoRA = WORKS 3. Unsloth + bfloat16...