Daniel Han comments

Results 781 comments of


                                            Daniel Han

Unable to use fine-tuned Llama 3 model on CPU

Oh for inference on CPU only, please use transformers directly - sadly we don't support CPU

Unable to use fine-tuned Llama 3 model on CPU

Ye use llama.cpp / GGUF for CPU inference

Unable to use fine-tuned Llama 3 model on CPU

Another option is to run inference on the CPU with native transformers with ```python from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer model = AutoPeftModelForCausalLM.from_pretrained( "lora_model", # YOUR MODEL YOU...

Issue with phi-3 on Long Sequences with Batches > 1

Oh interesting I'll check this and get back to you - sorry!

Issue with phi-3 on Long Sequences with Batches > 1

Apologies I'll escalate this to higher priority - will try getting a fix for this

Qwen2 error when loading from checkpoint

Hmmm weird - ill check this sorry on the issue

Qwen2 error when loading from checkpoint

Let me re prioritize this!

ValueError: Unknown quantization method: bitsandbytes. Must be one of ['awq', 'gptq', 'squeezellm', 'marlin'].

Oh you cannot use 4bit models - you must use `model.save_pretrained_merged` to 16bit then use vLLM

How to inference with the converted GGUF using llama-cpp?

A good idea to use llama-cpp's Python module - ill make an example

Using llama-cpp-python

Is this via Colab or Kaggle or local machines?