inference 使用V1.3.0镜像，启动SD3.5medium,报错expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16

模型文件是后台自动下载的，测试了FLEX（dev和schnell）的模型可以使用, 但是stable diffusion的模型不行，镜像启动命令： sudo docker run -it --rm -e XINFERENCE_MODEL_SRC=modelscope -v /home/th/storage/models:/data/models -e XINFERENCE_HOME=/data/models -p 9997:9997 --gpus all xprobe/xinference:v1.3.0 xinference-local -H 0.0.0.0 --log-level debug

镜像中，加载sd3.5没问题，但是生成的时候会报错：

2025-03-11 01:19:55,909 transformers.modeling_utils 421 INFO loading weights file /data/models/cache/sd3.5-medium/text_encoder/model.safetensors loading weights file /data/models/cache/sd3.5-medium/text_encoder/model.safetensors 2025-03-11 01:19:55,931 transformers.modeling_utils 421 INFO Instantiating CLIPTextModelWithProjection model under default dtype torch.bfloat16. Instantiating CLIPTextModelWithProjection model under default dtype torch.bfloat16. 2025-03-11 01:19:55,933 transformers.modeling_utils 421 INFO Instantiating CLIPTextModel model under default dtype torch.float16. Instantiating CLIPTextModel model under default dtype torch.float16. 2025-03-11 01:19:56,357 transformers.modeling_utils 421 INFO All model checkpoint weights were used when initializing CLIPTextModelWithProjection.

All model checkpoint weights were used when initializing CLIPTextModelWithProjection.

2025-03-11 01:19:56,357 transformers.modeling_utils 421 INFO All the weights of CLIPTextModelWithProjection were initialized from the model checkpoint at /data/models/cache/sd3.5-medium/text_encoder. If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPTextModelWithProjection for predictions without further training. All the weights of CLIPTextModelWithProjection were initialized from the model checkpoint at /data/models/cache/sd3.5-medium/text_encoder. If your task is similar to the task the model of the checkpoint was trained on, you can already use CLIPTextModelWithProjection for predictions without further training. Loading pipeline components...: 78%|██████████████████████████████████████████████████████████████████████████████████████████████████▊ | 7/9 [00:05<00:01, 1.27it/s]2025-03-11 01:19:56,400 transformers.tokenization_utils_base 421 INFO loading file spiece.model loading file spiece.model 2025-03-11 01:19:56,401 transformers.tokenization_utils_base 421 INFO loading file tokenizer.json loading file tokenizer.json 2025-03-11 01:19:56,401 transformers.tokenization_utils_base 421 INFO loading file added_tokens.json loading file added_tokens.json 2025-03-11 01:19:56,401 transformers.tokenization_utils_base 421 INFO loading file special_tokens_map.json loading file special_tokens_map.json 2025-03-11 01:19:56,401 transformers.tokenization_utils_base 421 INFO loading file tokenizer_config.json loading file tokenizer_config.json 2025-03-11 01:19:56,401 transformers.tokenization_utils_base 421 INFO loading file chat_template.jinja loading file chat_template.jinja 2025-03-11 01:19:56,403 transformers.models.t5.tokenization_t5_fast 421 WARNING You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:06<00:00, 1.49it/s] 2025-03-11 01:19:56,618 xinference.core.model 421 INFO ModelActor(sd3.5-medium-0) loaded 2025-03-11 01:19:56,620 xinference.core.worker 45 INFO [request 915e944e-fe51-11ef-9c7a-7a928cad8adb] Leave launch_builtin_model, elapsed time: 20 s 2025-03-11 01:19:56,643 xinference.core.supervisor 45 DEBUG [request 9de25836-fe51-11ef-9c7a-7a928cad8adb] Enter list_models, args: <xinference.core.supervisor.SupervisorActor object at 0x799d9b7ded40>, kwargs: 2025-03-11 01:19:56,644 xinference.core.worker 45 DEBUG [request 9de26f06-fe51-11ef-9c7a-7a928cad8adb] Enter list_models, args: <xinference.core.worker.WorkerActor object at 0x799c89abdf30>, kwargs: 2025-03-11 01:19:56,644 xinference.core.worker 45 DEBUG [request 9de26f06-fe51-11ef-9c7a-7a928cad8adb] Leave list_models, elapsed time: 0 s 2025-03-11 01:19:56,644 xinference.core.supervisor 45 DEBUG [request 9de25836-fe51-11ef-9c7a-7a928cad8adb] Leave list_models, elapsed time: 0 s 2025-03-11 01:20:02,176 xinference.core.supervisor 45 DEBUG [request a12e80d2-fe51-11ef-9c7a-7a928cad8adb] Enter list_models, args: <xinference.core.supervisor.SupervisorActor object at 0x799d9b7ded40>, kwargs: 2025-03-11 01:20:02,176 xinference.core.worker 45 DEBUG [request a12e9630-fe51-11ef-9c7a-7a928cad8adb] Enter list_models, args: <xinference.core.worker.WorkerActor object at 0x799c89abdf30>, kwargs: 2025-03-11 01:20:02,176 xinference.core.worker 45 DEBUG [request a12e9630-fe51-11ef-9c7a-7a928cad8adb] Leave list_models, elapsed time: 0 s 2025-03-11 01:20:02,176 xinference.core.supervisor 45 DEBUG [request a12e80d2-fe51-11ef-9c7a-7a928cad8adb] Leave list_models, elapsed time: 0 s IMPORTANT: You are using gradio version 4.26.0, however version 4.44.1 is available, please upgrade.

IMPORTANT: You are using gradio version 4.26.0, however version 4.44.1 is available, please upgrade.

2025-03-11 01:20:14,952 xinference.core.supervisor 45 DEBUG [request a8cbf7d4-fe51-11ef-9c7a-7a928cad8adb] Enter describe_model, args: <xinference.core.supervisor.SupervisorActor object at 0x799d9b7ded40>,sd3.5-medium, kwargs: 2025-03-11 01:20:14,952 xinference.core.worker 45 DEBUG Enter describe_model, args: <xinference.core.worker.WorkerActor object at 0x799c89abdf30>, kwargs: model_uid=sd3.5-medium-0 2025-03-11 01:20:14,952 xinference.core.worker 45 DEBUG Leave describe_model, elapsed time: 0 s 2025-03-11 01:20:14,953 xinference.core.supervisor 45 DEBUG [request a8cbf7d4-fe51-11ef-9c7a-7a928cad8adb] Leave describe_model, elapsed time: 0 s 2025-03-11 01:20:14,959 xinference.core.supervisor 45 DEBUG [request a8cd1c54-fe51-11ef-9c7a-7a928cad8adb] Enter get_model, args: <xinference.core.supervisor.SupervisorActor object at 0x799d9b7ded40>,sd3.5-medium, kwargs: 2025-03-11 01:20:14,960 xinference.core.worker 45 DEBUG Enter get_model, args: <xinference.core.worker.WorkerActor object at 0x799c89abdf30>, kwargs: model_uid=sd3.5-medium-0 2025-03-11 01:20:14,960 xinference.core.worker 45 DEBUG Leave get_model, elapsed time: 0 s 2025-03-11 01:20:14,960 xinference.core.supervisor 45 DEBUG [request a8cd1c54-fe51-11ef-9c7a-7a928cad8adb] Leave get_model, elapsed time: 0 s 2025-03-11 01:20:14,964 xinference.core.model 421 DEBUG Request text_to_image, current serve request count: 0, request limit: inf for the model sd3.5-medium-0 2025-03-11 01:20:14,964 xinference.core.model 421 DEBUG [request 29fb56a9-9ed6-46dc-b54f-9c212a12338a] Enter text_to_image, args: ModelActor(sd3.5-medium-0), kwargs: prompt=一个男孩，蓝色眼睛,n=1,size=1024*1024,response_format=b64_json,request_id=29fb56a9-9ed6-46dc-b54f-9c212a12338a,num_inference_steps=8,guidance_scale=None,negative_prompt=,sampler_name=None 2025-03-11 01:20:14,968 xinference.model.image.stable_diffusion.core 421 DEBUG stable diffusion args: {'prompt': '一个男孩，蓝色眼睛', 'num_images_per_prompt': 1, 'num_inference_steps': 8, 'negative_prompt': '', 'width': 1024, 'height': 1024, 'callback_on_step_end': <function DiffusionModel._process_progressor..report_status_callback at 0x7e03e64f5c60>}, model: StableDiffusion3Pipeline { "_class_name": "StableDiffusion3Pipeline", "_diffusers_version": "0.32.2", "_name_or_path": "/data/models/cache/sd3.5-medium", "feature_extractor": [ null, null ], "image_encoder": [ null, null ], "scheduler": [ "diffusers", "FlowMatchEulerDiscreteScheduler" ], "text_encoder": [ "transformers", "CLIPTextModelWithProjection" ], "text_encoder_2": [ "transformers", "CLIPTextModelWithProjection" ], "text_encoder_3": [ "transformers", "T5EncoderModel" ], "tokenizer": [ "transformers", "CLIPTokenizer" ], "tokenizer_2": [ "transformers", "CLIPTokenizer" ], "tokenizer_3": [ "transformers", "T5TokenizerFast" ], "transformer": [ "diffusers", "SD3Transformer2DModel" ], "vae": [ "diffusers", "AutoencoderKL" ] }

2025-03-11 01:20:15,654 xinference.core.model 421 ERROR [request 29fb56a9-9ed6-46dc-b54f-9c212a12338a] Leave text_to_image, error: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16, elapsed time: 0 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 1046, in text_to_image return await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 639, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 141, in _async_wrapper return await fn(self, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 664, in _call_wrapper ret = await asyncio.to_thread(fn, *args, **kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/image/stable_diffusion/core.py", line 545, in text_to_image return self._call_model( File "/usr/local/lib/python3.10/dist-packages/xinference/model/image/stable_diffusion/core.py", line 505, in _call_model images = model(**kwargs).images File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 969, in call ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 437, in encode_prompt prompt_embed, pooled_prompt_embed = self._get_clip_prompt_embeds( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 325, in _get_clip_prompt_embeds prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 1490, in forward text_embeds = self.text_projection(pooled_output) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 125, in forward return F.linear(input, self.weight, self.bias) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16 2025-03-11 01:20:15,665 xinference.core.model 421 DEBUG After request text_to_image, current serve request count: 0 for the model sd3.5-medium-0 2025-03-11 01:20:15,667 xinference.core.progress_tracker 45 DEBUG Setting progress, request id: 29fb56a9-9ed6-46dc-b54f-9c212a12338a, progress: 1.0 2025-03-11 01:20:15,669 xinference.api.restful_api 1 ERROR [address=0.0.0.0:39805, pid=421] expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1600, in create_images image_list = await model.text_to_image( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 106, in wrapped_func ret = await fn(self, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 1046, in text_to_image return await self._call_wrapper_json( File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 639, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 141, in _async_wrapper return await fn(self, *args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 664, in _call_wrapper ret = await asyncio.to_thread(fn, *args, **kwargs) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/image/stable_diffusion/core.py", line 545, in text_to_image return self._call_model( File "/usr/local/lib/python3.10/dist-packages/xinference/model/image/stable_diffusion/core.py", line 505, in _call_model images = model(**kwargs).images File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 969, in call ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 437, in encode_prompt prompt_embed, pooled_prompt_embed = self._get_clip_prompt_embeds( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 325, in _get_clip_prompt_embeds prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 1490, in forward text_embeds = self.text_projection(pooled_output) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 125, in forward return F.linear(input, self.weight, self.bias) RuntimeError: [address=0.0.0.0:39805, pid=421] expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1786, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1338, in call_function prediction = await anyio.to_thread.run_sync( File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, *args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 759, in wrapper response = f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/image_interface.py", line 143, in text_generate_image raise exc File "/usr/local/lib/python3.10/dist-packages/xinference/core/image_interface.py", line 117, in run_in_thread response = model.text_to_image( File "/usr/local/lib/python3.10/dist-packages/xinference/client/restful/restful_client.py", line 264, in text_to_image raise RuntimeError( RuntimeError: Failed to create the images, detail: [address=0.0.0.0:39805, pid=421] expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16

是什么原因呢

Mar 11 '25 08:03 tong1311

是什么显卡？

Mar 12 '25 10:03 qinxuye

是什么显卡？

您好显卡是RTX 4090

Mar 13 '25 00:03 tong1311

This issue is stale because it has been open for 7 days with no activity.

Mar 20 '25 19:03 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Mar 25 '25 19:03 github-actions[bot]

我是用1.4.0版本的镜像 4090显卡启动SD3.5模型也遇到了同样的报错，Failed to create the images, detail: [address=0.0.0.0:43367, pid=2240] expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16

Apr 18 '25 01:04 GinKry

This issue is stale because it has been open for 7 days with no activity.

Apr 25 '25 19:04 github-actions[bot]

This issue is stale because it has been open for 7 days with no activity.

May 03 '25 19:05 github-actions[bot]

This issue is stale because it has been open for 7 days with no activity.

May 11 '25 19:05 github-actions[bot]

加载有开 cpu offloading 吗？

May 12 '25 02:05 qinxuye

This issue is stale because it has been open for 7 days with no activity.

May 19 '25 19:05 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

May 25 '25 19:05 github-actions[bot]