ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

Chroma-HD model fails to load from HF repo or local clone: OSError "No such device (os error 19)"

Open rshp opened this issue 3 months ago • 3 comments

This is for bugs only

Did you already ask in the discord?

No

You verified that this is a bug and not a feature request or question by asking in the discord?

No

Describe the bug

Chroma-HD model doesn't work: doesn't clone from HF repo. Cloning manually to a folder and pointing the config there brings up this error: Running 1 process Using local model: /workspace/chroma_hd/ Loading transformer Error running job: No such device (os error 19)

======================================== Result:

  • 0 completed jobs
  • 1 failure ======================================== Traceback (most recent call last): Traceback (most recent call last): File "/workspace/ai-toolkit/run.py", line 120, in File "/workspace/ai-toolkit/run.py", line 120, in main()main() File "/workspace/ai-toolkit/run.py", line 108, in main File "/workspace/ai-toolkit/run.py", line 108, in main raise eraise e File "/workspace/ai-toolkit/run.py", line 96, in main File "/workspace/ai-toolkit/run.py", line 96, in main job.run()job.run() File "/workspace/ai-toolkit/jobs/ExtensionJob.py", line 22, in run File "/workspace/ai-toolkit/jobs/ExtensionJob.py", line 22, in run process.run()process.run() File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1518, in run File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1518, in run self.sd.load_model()self.sd.load_model() File "/workspace/ai-toolkit/extensions_built_in/diffusion_models/chroma/chroma_model.py", line 145, in load_model File "/workspace/ai-toolkit/extensions_built_in/diffusion_models/chroma/chroma_model.py", line 145, in load_model chroma_state_dict = load_file(model_path, 'cpu')chroma_state_dict = load_file(model_path, 'cpu') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/venv/main/lib/python3.12/site-packages/safetensors/torch.py", line 381, in load_file File "/venv/main/lib/python3.12/site-packages/safetensors/torch.py", line 381, in load_file with safe_open(filename, framework="pt", device=device) as f:with safe_open(filename, framework="pt", device=device) as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSErrorOSError: : No such device (os error 19)No such device (os error 19)

rshp avatar Sep 04 '25 16:09 rshp

Experiencing the Same Issue: Device Error During Model Loading

I'm encountering the same problem when running 1 process. Here's the detailed error log:

Execution Status

Using local model: /chenyudata/Other/Chroma1-HD

Loading transformer

Error Message

Error running job: No such device (os error 19)

Execution Results

======================================== Result:

0 completed jobs 1 failure

Complete Error Traceback

Traceback (most recent call last):
  File "/root/ai-toolkit/run.py", line 120, in <module>
    main()
  File "/root/ai-toolkit/run.py", line 108, in main
    raise e
  File "/root/ai-toolkit/run.py", line 96, in main
    job.run()
  File "/root/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
    process.run()
  File "/root/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1560, in run
    self.sd.load_model()
  File "/root/ai-toolkit/extensions_built_in/diffusion_models/chroma/chroma_model.py", line 151, in load_model
    chroma_state_dict = load_file(model_path, 'cpu')
  File "/usr/local/lib/python3.12/dist-packages/safetensors/torch.py", line 313, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
OSError: No such device (os error 19)

Issue Analysis

Based on the error traceback, the problem occurs during model loading when the safetensors library attempts to open the model file. The error "No such device (os error 19)" typically indicates that the system cannot access the specified device or storage path.

Additional Context

Model Path: /chenyudata/Other/Chroma1-HD Environment: Python 3.12 Failed Operation: Loading model state dict with load_file(model_path, 'cpu')

This appears to be the same underlying issue affecting both HuggingFace repo downloads and local model clones. Has anyone found a workaround or identified the root cause of this device access error?

wochenlong avatar Sep 19 '25 02:09 wochenlong

Having the same Issue, for both Chroma-HD and Chroma-Base

Wayfa avatar Oct 21 '25 19:10 Wayfa

Having the same Issue, for both Chroma-HD and Chroma-Base

using path to model.safetensors file to temporary fix it. Ex: User/Chroma1-Base/Chroma1-Base.safetensors instead User/Chroma1-Base/

qngv avatar Oct 22 '25 01:10 qngv