ykallan comments

Results 19 comments of


                                            ykallan

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

> This is not a bug, unsloth runs on only 1 gpu.. It wont run in multi gpu systems. > > force it to use only 1 gpu new version...

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

> [@shimmyshimmer](https://github.com/shimmyshimmer) isnt balanced and auto the same thing? did it work for you? [@ykallan](https://github.com/ykallan) @shimmyshimmer yes, set device_map to "auto", "balanced", it raise to same error as: ```text File...

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

i define a loss func, to move tensor to same device, like: ```python def my_cross_entropy_loss(model_output, labels, logit_softcapping=0, logit_scaling=0, n_items=None, *args, **kwargs): # move to same device batch, seq_len, d =...

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

i use the same script to fine-tuning model `Qwen/Qwen2.5-3B-Instruct` , it works ,and logs could be right: ```text {'loss': 97.2936, 'grad_norm': 210.89822387695312, 'learning_rate': 2.666666666666667e-06, 'epoch': 0.02} {'loss': 90.0748, 'grad_norm': 190.67738342285156,...

[Bug]Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

> I referred to [#2882 (comment)](https://github.com/unslothai/unsloth/issues/2882#issuecomment-3193784551) and edited this file: /data/miniconda3/envs/unsloth_mgpu/lib/python3.12/site-packages/unsloth/kernels/cross_entropy_loss.py > > Around line 410, I found: > > return loss.sum() / n_items > Modified it to: > >...

OSError: (External) CUDA error(719), unspecified launch failure.

以下是当前环境： ```text (base) [root@localhost log]# pip list Package Version ----------------------- ---------- anaconda-anon-usage 0.4.4 archspec 0.2.3 boltons 23.0.0 Brotli 1.0.9 cachetools 5.5.0 certifi 2025.4.26 cffi 1.16.0 charset-normalizer 3.3.2 conda 24.11.3 conda-content-trust...

ValueError: GGUF model with architecture ernie4_5-moe is not supported yet.

> 目前应该是没有单独适配过guff模型到vllm上，看报错信息也是类似。根据vllm文档目前支持是比较有限的[https://docs.vllm.com.cn/en/latest/features/quantization/gguf.html。](https://docs.vllm.com.cn/en/latest/features/quantization/gguf.html%E3%80%82) 可以使用--quantization fp8，进行8bit的在线量化，这个验证过是正常的。 > > 如果你对推理框架不要求的话，可以使用FastDeploy进行部署，提供wint4在线量化。模型用PT后缀的就行，FastDeploy已经支持了torch风格权重。参考文档https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/zh/get_started/ernie-4.5.md 双卡2080ti 共44g显存，fastdeploy加载报错启动命令如下： ```text export CUDA_VISIBLE_DEVICES=0,1 python -m fastdeploy.entrypoints.openai.api_server \ --model baidu/ERNIE-4.5-21B-A3B-WINT4-Paddle \ --port 8180 \ --metrics-port 8181 \ --engine-worker-queue-port 8182 \...

ValueError: GGUF model with architecture ernie4_5-moe is not supported yet.

当我加上参数 `--tensor-parallel-size 2`会报错： ```text [2025-12-04 17:44:39,233] [ WARNING] - PretrainedTokenizer will be deprecated and removed in the next major release. Please migrate to Hugging Face's transformers.PreTrainedTokenizer. Checkout paddleformers/transformers/qwen/tokenizer.py for an...

ValueError: GGUF model with architecture ernie4_5-moe is not supported yet.

请问目前21b有4位量化的模型权重吗。单卡22g，跑再大一点的量化有点费劲目前环境如下： ```text fastdeploy-gpu 2.3.0 paddlepaddle-gpu 3.2.2 transformers 4.55.4 ```