ComfyUI Model reloads for every API request, adding additional 15sec to each inference request. Issue with indexing.

This is a critical bug, but probably a quick fix for the devs.

To get a little more objective, in API mode it's taking ~13-20 seconds for each API call, or ~3.0 seconds through the UI, for the same request. This is completely because of model reloading.

In case it helps, a sample of logs:

Requested to load FluxClipModel_
got prompt
loaded partially 7709.3625 7709.36181640625 0
Requested to load Flux
loaded partially 7709.3625 7708.8067626953125 239
100%|█████| 6/6 [00:02<00:00,  1.25s/it]
Requested to load AutoencodingEngine
loaded completely 392.293359375 159.87335777282715 True
Prompt executed in 20.23 seconds
Requested to load FluxClipModel_
loaded partially 7709.3625 7709.36181640625 0
Requested to load Flux
loaded partially 7709.3625 7708.8067626953125 239
100%|█████| 6/6 [00:07<00:00,  1.26s/it]
Requested to load AutoencodingEngine
loaded completely 392.293359375 159.87335777282715 True
Prompt executed in 13.32 seconds
got prompt
Requested to load FluxClipModel_
loaded partially 7709.3625 7709.36181640625 0
Requested to load Flux
loaded partially 7709.3625 7708.8067626953125 239
...

On a whim, I loaded the api workflow in the UI to see if that could somehow force the API model to remain loaded, but alas it does not.

So it appears that comfy.model_management.load_models_gpu is getting called each time, and if loaded through the API interface, it appears there's an indexing mismatch, so this is probably a quick fix:

    for x in models:
        loaded_model = LoadedModel(x)
        try:
            loaded_model_index = current_loaded_models.index(loaded_model)
        except Exception as e:
            loaded_model_index = None
            logging.info(f'current_loaded_models.index fail: {e}')   # <<<<<<<<<<<<<<<<<<<<<<

        if loaded_model_index is not None:
            loaded = current_loaded_models[loaded_model_index]
            loaded.currently_used = True
            models_to_load.append(loaded)
        else:
            if hasattr(x, "model"):
                logging.info(f"Requested to load {x.model.__class__.__name__}")
            models_to_load.append(loaded_model)

Indeed, logging the except branch shows it's failing each time to find the API-loaded model:

current_loaded_models.index fail: <comfy.model_management.LoadedModel object at 0x75c8e718b700> is not in list
current_loaded_models.index fail: <comfy.model_management.LoadedModel object at 0x75c8e718a233> is not in list
current_loaded_models.index fail: <comfy.model_management.LoadedModel object at 0x75c8e718a212> is not in list
current_loaded_models.index fail: <comfy.model_management.LoadedModel object at 0x75c8e718a368> is not in list
...

And note, in UI-only mode, this only happens the first time the model is loaded, and successfully finds the loaded models thereafter.

Originally posted by @freckletonj in #2503

Feb 20 '25 21:02 freckletonj

yes,a big bug in api

Feb 24 '25 04:02 klausHou

I'm happy to jump in and help. Could someone from the team point me to relevant portions of the code I'll need to touch?

Feb 28 '25 23:02 freckletonj

I'm still willing to help if I get some direction <3

@mcmonkey4eva I saw you were responding on this related issue, are you the right person to ask, or do you know who is?

Mar 07 '25 21:03 freckletonj

+1

I face the same issue, any help would be appreciated.

Apr 24 '25 14:04 derhuebiii