WeClone

WeClone copied to clipboard

Reame
Issues

make-dataset出错，被killed

Open shidanwangya opened this issue 7 months ago • 4 comments

环境：Win11下，wsl2，Ubuntu22.04

配置：2080ti，wsl2分配了16gb内存 1.加了4bit量化： "quantization_bit": 4, "quantization_type": "nf4", "double_quantization": true, "quantization_method": "bitsandbytes", 2.type改成 "dtype": torch.float16，因为2080ti显卡不支持bflo， 3. 量化与显存限制也开了。 "gpu_memory_utilization": 0.85, "quantization": "bitsandbytes", "load_format": "bitsandbytes", 运行后(.venv) root@D-NO1:/home/ace/WeClone# weclone-cli make-dataset INFO 05-18 00:55:37 [init.py:239] Automatically detected platform cuda. [WeClone] I | 00:55:38 | Loading configuration from: ./settings.jsonc [WeClone] I | 00:55:38 | 聊天记录禁用词: ['例如姓名', '//.....', '例如密码'] [WeClone] W | 00:55:38 | 组合后消息长度超过256将截断：一大堆信息截取... 然后：[WeClone] I | 00:55:40 | 开始使用llm对数据打分 [INFO|configuration_utils.py:771] 2025-05-18 00:55:41,902 >> Model config Qwen2Config { "_name_or_path": "./Qwen2.5-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }

[INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:41,904 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-18 00:55:42,157 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:697] 2025-05-18 00:55:42,157 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:771] 2025-05-18 00:55:42,158 >> Model config Qwen2Config { "_name_or_path": "./Qwen2.5-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }

[INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,158 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,158 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,158 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,158 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,159 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,159 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:42,159 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-18 00:55:42,380 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-05-18 00:55:42] llamafactory.data.template:157 >> Add <|im_end|> to stop words. [INFO|configuration_utils.py:697] 2025-05-18 00:55:42,412 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:697] 2025-05-18 00:55:42,413 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:771] 2025-05-18 00:55:42,413 >> Model config Qwen2Config { "_name_or_path": "./Qwen2.5-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }

[INFO|image_processing_auto.py:301] 2025-05-18 00:55:42,423 >> Could not locate the image processor configuration file, will try to use the model config instead. WARNING 05-18 00:55:42 [config.py:2614] Casting torch.bfloat16 to torch.float16. INFO 05-18 00:55:48 [config.py:585] This model supports multiple tasks: {'score', 'classify', 'generate', 'reward', 'embed'}. Defaulting to 'generate'. WARNING 05-18 00:55:48 [config.py:664] bitsandbytes quantization is not fully optimized yet. The speed can be slower than non-quantized models. WARNING 05-18 00:55:48 [arg_utils.py:1854] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. INFO 05-18 00:55:48 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2) with config: model='./Qwen2.5-7B-Instruct', speculative_config=None, tokenizer='./Qwen2.5-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=3072, download_dir=None, load_format=bitsandbytes, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=./Qwen2.5-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-18 00:55:48,960 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-18 00:55:49,182 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:1093] 2025-05-18 00:55:49,261 >> loading configuration file ./Qwen2.5-7B-Instruct/generation_config.json [INFO|configuration_utils.py:1140] 2025-05-18 00:55:49,261 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 }

WARNING 05-18 00:55:49 [interface.py:303] Using 'pin_memory=False' as WSL is detected. This may slow down the performance. INFO 05-18 00:55:49 [cuda.py:239] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. INFO 05-18 00:55:49 [cuda.py:288] Using XFormers backend. INFO 05-18 00:55:49 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0 INFO 05-18 00:55:49 [model_runner.py:1110] Starting to load model ./Qwen2.5-7B-Instruct... INFO 05-18 00:55:50 [loader.py:1155] Loading weights with BitsAndBytes quantization. May take a while ... Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:04<00:14, 4.75s/it] Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:07<00:07, 3.65s/it] Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:10<00:03, 3.40s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:13<00:00, 3.07s/it] Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:13<00:00, 3.32s/it]

INFO 05-18 00:56:03 [model_runner.py:1146] Model loading took 5.2045 GB and 13.805371 seconds INFO 05-18 00:56:09 [worker.py:267] Memory profiling takes 5.34 seconds INFO 05-18 00:56:09 [worker.py:267] the current vLLM instance can use total_gpu_memory (11.00GiB) x gpu_memory_utilization (0.85) = 9.35GiB INFO 05-18 00:56:09 [worker.py:267] model weights take 5.20GiB; non_torch_memory takes 0.05GiB; PyTorch activation peak memory takes 1.41GiB; the rest of the memory reserved for KV Cache is 2.69GiB. INFO 05-18 00:56:09 [executor_base.py:111] # cuda blocks: 3146, # CPU blocks: 4681 INFO 05-18 00:56:09 [executor_base.py:116] Maximum concurrency for 3072 tokens per request: 16.39x INFO 05-18 00:56:09 [model_runner.py:1442] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. Capturing CUDA graph shapes: 100%|███████████████████████████████████████████████████████████████████| 35/35 [00:35<00:00, 1.02s/it] INFO 05-18 00:56:45 [model_runner.py:1570] Graph capturing finished in 36 secs, took 0.64 GiB INFO 05-18 00:56:45 [llm_engine.py:447] init engine (profile, create kv cache, warmup model) took 41.86 seconds INFO 05-18 00:56:45 [xgrammar_decoding.py:191] Qwen model detected, consider set guided_backend=xgrammar:disable-any-whitespace to prevent runaway generation of whitespaces. Killed (.venv) root@D-NO1:/home/ace/WeClone#

就没了，在56分后GPU显存占用为10gb左右，3D占用无。D盘（虚拟机盘）在KIlled最后一分钟有大量读取，随后便被Kill。求解答！

May 17 '25 17:05 shidanwangya

没有清洗过程的日志嘛，也没报错直接就kill掉了？

May 18 '25 02:05 xming521

没有清洗过程的日志嘛，也没报错直接就kill掉了？

这些就是全部日志了，刷一下就没了，我都懵了

May 18 '25 02:05 shidanwangya

我也有类似的错误，看了一下是因为程序out of memory了，看了一下不知道是不是因为数据量太多了，现在减少了数据量再次运行weclone-cli make-dataset，希望能成功

May 18 '25 03:05 Floral

程序被killed一般是内存不够了，看你数据量多少，16g一般都不够，我20g也报这个，后来分了32g还不够，再给加了20g的swap可以了

May 18 '25 09:05 JackWang27