ComfyUI Cache initialization between each run is very slow when having many calls to `folder_paths.get_filename_list` in `INPUT

Expected Behavior

The load time should be optimally fast before #2666. The problematic line is this:

https://github.com/comfyanonymous/ComfyUI/blob/9f4daca9d9994c600ef713856e0182d0177b9a1e/execution.py#L474-L476

Actual Behavior

Load time is very slow, as shown in log it took a whole minute (I have added debug code). Not tested on which kind of big O growth is here, maybe exponentially.

        with torch.inference_mode():
            dynamic_prompt = DynamicPrompt(prompt)
            is_changed_cache = IsChangedCache(dynamic_prompt, self.caches.outputs)
            logging.info("Setting prompt for cache")
            prompt_keys = prompt.keys()
            for cache in self.caches.all:
                cache.set_prompt(dynamic_prompt, prompt_keys, is_changed_cache)
                cache.clean_unused()
            logging.info("Setted prompt for cache")

Steps to Reproduce

Have a decently large workflow with over >50 nodes. Can test with https://perilli.com/ai/comfyui/

Debug Logs

2024-09-11 03:30:28,124- root:522- INFO- got prompt
2024-09-11 03:30:28,133- root:540- INFO- Validating prompt
2024-09-11 03:30:28,134- root:542- INFO- Validated prompt
2024-09-11 03:30:28,171- root:127- INFO- Executing prompt e81c5045-d2e6-49dd-abc2-2a22406aca33
2024-09-11 03:30:28,172- root:468- INFO- Setting prompt for cache
2024-09-11 03:31:42,298- root:473- INFO- Setted prompt for cache
2024-09-11 03:32:48,690- root:106- INFO- model weight dtype torch.float16, manual cast: None
2024-09-11 03:32:48,926- root:115- INFO- model_type EPS
2024-09-11 03:32:49,855- root:271- INFO- Using xformers attention in VAE

Other

         14136356 function calls (9707273 primitive calls) in 65.405 seconds

   Ordered by: cumulative time
   List reduced from 221 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   65.405   65.405 {built-in method builtins.exec}
        1    0.000    0.000   65.404   65.404 <string>:1(<module>)
        2    0.000    0.000   65.404   32.702 caching.py:266(set_prompt)
        3    0.000    0.000   65.403   21.801 caching.py:143(set_prompt)
        2    0.000    0.000   65.402   32.701 caching.py:66(__init__)
        2    0.008    0.004   65.402   32.701 caching.py:75(add_keys)
      574    0.021    0.000   65.394    0.114 caching.py:85(get_node_signature)
    18416    0.206    0.000   59.277    0.003 caching.py:93(get_immediate_node_signature)
     7610    0.037    0.000   58.709    0.008 folder_paths.py:238(get_filename_list)
     7610    0.247    0.000   58.608    0.008 folder_paths.py:216(cached_filename_list_)
   116987   58.195    0.000   58.195    0.000 {built-in method nt.stat}
    94282    0.116    0.000   50.780    0.001 <frozen genericpath>:53(getmtime)
     4170    0.012    0.000   33.303    0.008 nodes.py:521(INPUT_TYPES)
     3304    0.016    0.000   25.226    0.008 nodes.py:604(INPUT_TYPES)
    22705    0.045    0.000    7.584    0.000 <frozen genericpath>:39(isdir)
1793074/574    1.903    0.000    5.950    0.010 caching.py:36(to_hashable)
435026/574    0.304    0.000    5.937    0.010 caching.py:44(<listcomp>)
137880/5716    0.287    0.000    4.651    0.001 caching.py:42(<listcomp>)
5125254/3109390    0.874    0.000    3.414    0.000 {built-in method builtins.isinstance}
  1007932    0.453    0.000    2.746    0.000 typing.py:1304(__instancecheck__)
  1007932    0.612    0.000    2.293    0.000 typing.py:1579(__subclasscheck__)
  1007932    0.960    0.000    1.475    0.000 {built-in method builtins.issubclass}
1008346/1007932    0.267    0.000    0.516    0.000 <frozen abc>:121(__subclasscheck__)
1008346/1007932    0.227    0.000    0.248    0.000 {built-in method _abc._abc_subclasscheck}
      574    0.001    0.000    0.143    0.000 caching.py:115(get_ordered_ancestry)
18416/574    0.062    0.000    0.143    0.000 caching.py:121(get_ordered_ancestry_internal)
    18416    0.009    0.000    0.138    0.000 execution.py:37(get)
   150866    0.092    0.000    0.132    0.000 graph_utils.py:1(is_link)
   330/80    0.001    0.000    0.129    0.002 utils.py:346(new_func)
       60    0.000    0.000    0.119    0.002 IPAdapterPlus.py:647(INPUT_TYPES)
       40    0.000    0.000    0.113    0.003 execution.py:135(_map_node_over_list)
       40    0.000    0.000    0.112    0.003 execution.py:149(process_inputs)
   174788    0.090    0.000    0.090    0.000 {built-in method builtins.sorted}
       21    0.000    0.000    0.081    0.004 nodes.py:2962(IS_CHANGED)
  28032/1    0.044    0.000    0.080    0.080 copy.py:128(deepcopy)
   2868/1    0.009    0.000    0.080    0.080 copy.py:227(_deepcopy_dict)
        8    0.000    0.000    0.063    0.008 load.py:102(INPUT_TYPES)
5438/1182    0.008    0.000    0.061    0.000 copy.py:201(_deepcopy_list)
        3    0.000    0.000    0.058    0.019 folder_paths.py:203(get_filename_list_)
        7    0.001    0.000    0.056    0.008 folder_paths.py:145(recursive_search)
        1    0.000    0.000    0.032    0.032 nodes.py:1374(IS_CHANGED)
    53/24    0.001    0.000    0.029    0.001 <frozen os>:345(_walk)
       18    0.023    0.001    0.023    0.001 {built-in method nt.scandir}
   112054    0.022    0.000    0.022    0.000 <frozen _collections_abc>:315(__subclasshook__)
    18804    0.021    0.000    0.021    0.000 {built-in method builtins.hasattr}
        1    0.019    0.019    0.019    0.019 {method 'read' of '_io.BufferedReader' objects}
   136222    0.018    0.000    0.018    0.000 {method 'append' of 'list' objects}
      287    0.003    0.000    0.018    0.000 <frozen ntpath>:741(relpath)
        8    0.000    0.000    0.017    0.002 nodes_upscale_model.py:18(INPUT_TYPES)
    37693    0.016    0.000    0.016    0.000 graph.py:30(has_node)

Recorded by using cProfile and wrap around the problematic lines.

graphviz

Sep 11 '24 10:09 Trung0246

Maybe @guill would be interested in this.

Sep 11 '24 10:09 Trung0246

After looking around a bit, looks like LoraLoader and CheckpointLoaderSimple is the culprit since they calls folder_paths.get_filename_list repeatedly during cache key initialization. One simple fix could be caching the output and and dump the cache if user explicitly click Refresh over the web UI. But looks like the cache is there already, guess calls to isdir and getmtime nott cached.

Sep 11 '24 11:09 Trung0246

Nice find and thanks for the ping! I'll try to get this fixed by this weekend.

Sep 11 '24 14:09 guill

This pull solved this issue for me: https://github.com/comfyanonymous/ComfyUI/pull/4969

Oct 13 '24 00:10 Trung0246

This pull solved this issue: #4969

@Trung0246 I have updated to this commit id, but the problem still exists. clean_unused with take 4min+, I directly used the PromptExecutor.execute method and did not start the HTTP request.

Nov 06 '24 09:11 LiChangyi

Cache initialization between each run is very slow when having many calls to `folder_paths.get_filename_list` in `INPUT_TYPES`

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other