ykallan
ykallan
> This is not a bug, unsloth runs on only 1 gpu.. It wont run in multi gpu systems. > > force it to use only 1 gpu new version...
> [@shimmyshimmer](https://github.com/shimmyshimmer) isnt balanced and auto the same thing? did it work for you? [@ykallan](https://github.com/ykallan) @shimmyshimmer yes, set device_map to "auto", "balanced", it raise to same error as: ```text File...
i define a loss func, to move tensor to same device, like: ```python def my_cross_entropy_loss(model_output, labels, logit_softcapping=0, logit_scaling=0, n_items=None, *args, **kwargs): # move to same device batch, seq_len, d =...
i use the same script to fine-tuning model `Qwen/Qwen2.5-3B-Instruct` , it works ,and logs could be right: ```text {'loss': 97.2936, 'grad_norm': 210.89822387695312, 'learning_rate': 2.666666666666667e-06, 'epoch': 0.02} {'loss': 90.0748, 'grad_norm': 190.67738342285156,...
> I referred to [#2882 (comment)](https://github.com/unslothai/unsloth/issues/2882#issuecomment-3193784551) and edited this file: /data/miniconda3/envs/unsloth_mgpu/lib/python3.12/site-packages/unsloth/kernels/cross_entropy_loss.py > > Around line 410, I found: > > return loss.sum() / n_items > Modified it to: > >...
以下是当前环境: ```text (base) [root@localhost log]# pip list Package Version ----------------------- ---------- anaconda-anon-usage 0.4.4 archspec 0.2.3 boltons 23.0.0 Brotli 1.0.9 cachetools 5.5.0 certifi 2025.4.26 cffi 1.16.0 charset-normalizer 3.3.2 conda 24.11.3 conda-content-trust...
> 目前应该是没有单独适配过guff模型到vllm上,看报错信息也是类似。 根据vllm文档目前支持是比较有限的[https://docs.vllm.com.cn/en/latest/features/quantization/gguf.html。](https://docs.vllm.com.cn/en/latest/features/quantization/gguf.html%E3%80%82) 可以使用--quantization fp8,进行8bit的在线量化,这个验证过是正常的。 > > 如果你对推理框架不要求的话,可以使用FastDeploy进行部署,提供wint4在线量化。模型用PT后缀的就行,FastDeploy已经支持了torch风格权重。参考文档https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/zh/get_started/ernie-4.5.md 双卡2080ti 共44g显存,fastdeploy加载报错 启动命令如下: ```text export CUDA_VISIBLE_DEVICES=0,1 python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-21B-A3B-WINT4-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \...
当我加上参数 `--tensor-parallel-size 2`会报错: ```text [2025-12-04 17:44:39,233] [ WARNING] - PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. Checkout paddleformers/transformers/qwen/tokenizer.py for an...
请问目前21b有4位量化的模型权重吗。单卡22g,跑再大一点的量化有点费劲 目前环境如下: ```text fastdeploy-gpu 2.3.0 paddlepaddle-gpu 3.2.2 transformers 4.55.4 ```