ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Update utils.py: fix very slow loading speed of .safetensors files

Open Yaruze66 opened this issue 1 year ago • 2 comments

I'm not sure if this could have any downsides, but in my case .safetensors files now load twice as fast. This will be especially useful for those who store hundreds of gigabytes of models on hard drives.

Yaruze66 avatar Jul 07 '23 19:07 Yaruze66

Isn't this going to take 2x the memory to load them?

comfyanonymous avatar Jul 08 '23 07:07 comfyanonymous

@comfyanonymous I don't know, I didn't notice an increase in memory load 🤔

Yaruze66 avatar Jul 08 '23 14:07 Yaruze66

This does not seem like a great idea, at least not in general for all systems, testing only the load_torch_file function with

    import sys

    import resource
    import time

    memory_before = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss # This is the maximum resident set size used (in kilobytes)
    time_before = time.perf_counter()
    load_torch_file(sys.argv[1], # sd_xl_base_1.0.safetensors
                    safe_load=False, device=None)

    time_after = time.perf_counter()
    memory_after = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    print('memory after-before {:.0f}-{:.0f}MB = {:.0f}MB'.format(memory_before/1e3,
                                                                 memory_after/1e3,
                                                                 (memory_after-memory_before)/1e3))
    print('time {:.3g}'.format(time_after-time_before))

I get with original code: sd = safetensors.torch.load_file(ckpt, device=device.type)

memory after-before 363-444MB = 81MB
time 0.773

and way worse with the modified code sd = safetensors.torch.load(open(ckpt, 'rb').read())

memory after-before 363-13923MB = 13560MB
time 4.44

on a AMD Ryzen 7 3700X, RTX 3060, Ubuntu 22.04

This seems like it would be more appropriate to discuss upstream with the safetensors library devs. What this change does is it reads the file fully into memory and then loads directly from that memory; I'm assuming load_file is more memory-efficient and parses the file as it's being read from disk.

This should usually be about as fast or even faster than reading into memory first, but depending on what load_file does, it is possible it might have a performance issue on some platforms.

asagi4 avatar Aug 04 '23 12:08 asagi4