使用T4 GPU时报错:需要支持半精度
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the dtype flag in CLI, for example: --dtype=half.
直接在cli后加--dtype=half无法传入参数
这个报错是哪个库报的
[WeClone] I | 19:45:41 | 开始使用llm对数据打分 [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-16 19:45:46,922 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:697] 2025-05-16 19:45:46,923 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:771] 2025-05-16 19:45:46,924 >> Model config Qwen2Config { "_name_or_path": "./Qwen2.5-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }
[INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,925 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-16 19:45:47,495 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:697] 2025-05-16 19:45:47,591 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:697] 2025-05-16 19:45:47,591 >> loading configuration file ./Qwen2.5-7B-Instruct/config.json [INFO|configuration_utils.py:771] 2025-05-16 19:45:47,592 >> Model config Qwen2Config { "_name_or_path": "./Qwen2.5-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 }
[INFO|image_processing_auto.py:301] 2025-05-16 19:45:47,596 >> Could not locate the image processor configuration file, will try to use the model config instead. INFO 05-16 19:45:59 [config.py:585] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'. WARNING 05-16 19:45:59 [arg_utils.py:1854] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0. INFO 05-16 19:45:59 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2) with config: model='./Qwen2.5-7B-Instruct', speculative_config=None, tokenizer='./Qwen2.5-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=3072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=./Qwen2.5-7B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file vocab.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file merges.txt [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:59,661 >> loading file chat_template.jinja [INFO|tokenization_utils_base.py:2313] 2025-05-16 19:46:00,034 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:1093] 2025-05-16 19:46:00,147 >> loading configuration file ./Qwen2.5-7B-Instruct/generation_config.json [INFO|configuration_utils.py:1140] 2025-05-16 19:46:00,147 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 }
INFO 05-16 19:46:01 [cuda.py:239] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 05-16 19:46:01 [cuda.py:288] Using XFormers backend.
Traceback (most recent call last):
File "/usr/local/bin/weclone-cli", line 10, in dtype flag in CLI, for example: --dtype=half.
腾讯云免费算力T4GPU同样报错
配置里的enable_clean设为false,不清洗数据集 试试
这个原因是因为bfloat16 (BF16) 是一种较新的浮点数格式,需要GPU硬件计算能力 >= 8.0才能原生支持,T4包括v100好像都不行。 在settings.jsonc中增加下面的参数,然后重新执行weclone-cli make-dataset试试呢?
"infer_args": { "repetition_penalty": 1.2, "temperature": 0.5, "max_length": 50, "top_p": 0.65, "infer_dtype": "float16" // 添加这一行 }
[INFO|tokenization_utils_base.py:2048] 2025-05-16 19:45:46,398 >> loading file vocab.json
再不行就直接改weclone/data/clean/strategies.py,找到下面这段代码然后插入修改。这个方法实测可行。如果方便的话麻烦帮忙测试一下上面settings.jsonc中增加的参数可不可行。
outputs = vllm_infer( inputs, self.make_dataset_config["model_name_or_path"], template=self.make_dataset_config["template"], temperature=0, guided_decoding_class=QaPairScore, repetition_penalty=1.2, bad_words=[r"\n"], vllm_config= json.dumps({"dtype": "float16"}) # 这里直接传递表达式的结果 )
这个原因是因为bfloat16 (BF16) 是一种较新的浮点数格式,需要GPU硬件计算能力 >= 8.0才能原生支持,T4包括v100好像都不行。 在settings.jsonc中增加下面的参数,然后重新执行weclone-cli make-dataset试试呢?
"infer_args": { "repetition_penalty": 1.2, "temperature": 0.5, "max_length": 50, "top_p": 0.65, "infer_dtype": "float16" // 添加这一行 }
我在 Colab 上测试了,修改 settings.jsonc 后还是会报同样的错误,但是直接改 strategies.py 没问题