swift issues

Results 206 swift issues

Sort by recently updated

windows 系统训练失败，数据集字符编码问题

**Describe the bug** [WARNING:modelscope] Reusing dataset dataset_builder (C:\Users\DELL\.cache\modelscope\hub\datasets\iic\ms_bench\master\data_files) [INFO:modelscope] Generating dataset dataset_builder (C:\Users\DELL\.cache\modelscope\hub\datasets\iic\ms_bench\master\data_files) [INFO:modelscope] Reusing cached meta-data file: C:\Users\DELL\.cache\modelscope\hub\datasets\iic\ms_bench\master\data_files\2b408f043079b23300a89e65c7a2d027 Traceback (most recent call last): File "C:\Python311\Lib\site-packages\swift\cli\sft.py", line 5, in sft_main()...

yjc980121

多模态模型的部署

多模态模型后面有计划提供部署脚本吗？

wuweinero

enhancement

使用lora微调合并权重加载模型报错

使用：model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'},model_id_or_path=ckpt_dir)，报错如下： ```bash Qwen2ForCausalLM.__init__() got an unexpected keyword argument 'eos_token_id' ``` 怎么解决呢，合并权重代码：CUDA_VISIBLE_DEVICES=1 swift export --ckpt_dir 'output/qwen1half-7b-chat/---/checkpoint...' --merge_lora true --dtype fp16

qianliyx

微调llama3模型做多分类任务

你好，我想微调llama3做文本多分类模型，但有一些困惑，希望能得到指点 1. 做长文本分类，是用llama3-base模型好，还是llama3-instruct模型好？我只知道后者是针对对话场景做了微调，前者是纯粹的文本补全模型,但选择哪个比较迷茫 2. 假设用llama3-base模型，可以直接使用scripts/llama3-instruct的脚本吗（修改数据集和模型路径部分），还是说要有比较大的改变 3. 数据集部分，我自定义的数据集为json文件，每个dict格式为Instruction, Input, Output. 不过应用到我的文本多分类问题中，我的所有Instruction都是一样的, 阐述要进行分类，然后input是句子，这样是不是有点奇怪?微调效果和用conversation,user,assistant相比差别大吗 4. 假设我用Instruction, input, output的形式，那我在推理测试微调后模型的性能时，需要在无history的情况下让模型对测试集每一项做predict并且储存预测结果，那我感觉不好使用infer的脚本了，怎么实现一个最小化的python脚本，把lora模型参数还原成model但是不保存merge的文件，然后做预测（正常预测的代码我知道，但是如果我是Intruction Input Output的形式微调后的模型，那预测的数据格式应该怎么写呢）

Yuxin715d

question

想问问在swift框架下，如何提取模型最后一层的输出作为embedding呢？

如题，以Qwen1.5-7B-chat为例

sunyrain

全参数微调的模型如何infer呢？

运行命令 RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=2 swift infer \ --model_type chatglm2-6b \ --model_id_or_path /data/LLM_checkpoint/chatglm2-6b/chatglm2-6b \ --infer_backend vllm --tensor_parallel_size 1 报错，其中model_id_or_path是全参数微调的模型没有经过lora [INFO:swift] Due to `ckpt_dir` being `None`, `load_args_from_ckpt_dir` is set to `False`. Traceback (most...

HJT9328

question

win10训练qwen1.5-moe-A2.7B-chat-gptq-int4速度缓慢

**Describe the bug** 在powershell中运行下列命令 ``` D:\github\ENV\qwen\Scripts\python.exe d:\github\swift\swift\cli\sft.py ` --model_type qwen1half-moe-a2_7b-chat-int4 ` --model_id_or_path "D:\models\Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4" ` --sft_type lora ` --dtype AUTO ` --output_dir "D:\github\swift\output" ` --train_dataset_sample -1 ` --num_train_epochs 3 ` --max_length...

catundchat

tuner_backend选swift的时候，rsLoRA训练中loss正常下降，但训练后模型推理输出空值

**Describe the bug** tuner_backend选swift的时候，rsLoRA训练中loss正常下降，但训练后模型推理输出空值，改成peft后正常错误推理输出类似 ``` "v_proj", > "up_proj", > "o_proj", > "gate_proj" ``` 好像是最近（2024年4月后，很可能是上星期后）出现的bug，之前没指定tuner_backend都正常训练，看命令行参数说默认swift **Your hardware and system info** 测试了和硬件无关，和多机数量无关，和数据集无关用的是最新的swift代码，模型是yi-6B-chat

WSC741606

Multimodal model & neftune_noise_alpha is not compatible. Dimension out of range.

CUDA_VISIBLE_DEVICES=0,1,2,3 \ swift sft \ --neftune_noise_alpha "5" \ --model_id_or_path "AI-ModelScope/llava-v1.6-mistral-7b" \ --template_type "llava-mistral-instruct" \ --custom_train_dataset_path xxx.json \ --custom_val_dataset_path xxx.json \ --dataset_test_ratio "0.2" \ --save_steps "50" \ --lora_target_modules q_proj k_proj v_proj...

AlexJJJChen

question

能否支持/v1/embeddings的api调用

chuanSir123

swift
swift copied to clipboard

Metadata

windows 系统训练失败，数据集字符编码问题

多模态模型的部署

使用lora微调合并权重加载模型报错

微调llama3模型做多分类任务

想问问在swift框架下，如何提取模型最后一层的输出作为embedding呢？

全参数微调的模型如何infer呢？

win10训练qwen1.5-moe-A2.7B-chat-gptq-int4速度缓慢

tuner_backend选swift的时候，rsLoRA训练中loss正常下降，但训练后模型推理输出空值

Multimodal model & neftune_noise_alpha is not compatible. Dimension out of range.

能否支持/v1/embeddings的api调用

← Metadata

Owner

Metadata

swift swift copied to clipboard

Metadata

← Metadata

Owner

Metadata

swift
swift copied to clipboard