Cache initialization between each run is very slow when having many calls to `folder_paths.get_filename_list` in `INPUT_TYPES`
Expected Behavior
The load time should be optimally fast before #2666. The problematic line is this:
https://github.com/comfyanonymous/ComfyUI/blob/9f4daca9d9994c600ef713856e0182d0177b9a1e/execution.py#L474-L476
Actual Behavior
Load time is very slow, as shown in log it took a whole minute (I have added debug code). Not tested on which kind of big O growth is here, maybe exponentially.
with torch.inference_mode():
dynamic_prompt = DynamicPrompt(prompt)
is_changed_cache = IsChangedCache(dynamic_prompt, self.caches.outputs)
logging.info("Setting prompt for cache")
prompt_keys = prompt.keys()
for cache in self.caches.all:
cache.set_prompt(dynamic_prompt, prompt_keys, is_changed_cache)
cache.clean_unused()
logging.info("Setted prompt for cache")
Steps to Reproduce
Have a decently large workflow with over >50 nodes. Can test with https://perilli.com/ai/comfyui/
Debug Logs
2024-09-11 03:30:28,124- root:522- INFO- got prompt
2024-09-11 03:30:28,133- root:540- INFO- Validating prompt
2024-09-11 03:30:28,134- root:542- INFO- Validated prompt
2024-09-11 03:30:28,171- root:127- INFO- Executing prompt e81c5045-d2e6-49dd-abc2-2a22406aca33
2024-09-11 03:30:28,172- root:468- INFO- Setting prompt for cache
2024-09-11 03:31:42,298- root:473- INFO- Setted prompt for cache
2024-09-11 03:32:48,690- root:106- INFO- model weight dtype torch.float16, manual cast: None
2024-09-11 03:32:48,926- root:115- INFO- model_type EPS
2024-09-11 03:32:49,855- root:271- INFO- Using xformers attention in VAE
Other
14136356 function calls (9707273 primitive calls) in 65.405 seconds
Ordered by: cumulative time
List reduced from 221 to 50 due to restriction <50>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 65.405 65.405 {built-in method builtins.exec}
1 0.000 0.000 65.404 65.404 <string>:1(<module>)
2 0.000 0.000 65.404 32.702 caching.py:266(set_prompt)
3 0.000 0.000 65.403 21.801 caching.py:143(set_prompt)
2 0.000 0.000 65.402 32.701 caching.py:66(__init__)
2 0.008 0.004 65.402 32.701 caching.py:75(add_keys)
574 0.021 0.000 65.394 0.114 caching.py:85(get_node_signature)
18416 0.206 0.000 59.277 0.003 caching.py:93(get_immediate_node_signature)
7610 0.037 0.000 58.709 0.008 folder_paths.py:238(get_filename_list)
7610 0.247 0.000 58.608 0.008 folder_paths.py:216(cached_filename_list_)
116987 58.195 0.000 58.195 0.000 {built-in method nt.stat}
94282 0.116 0.000 50.780 0.001 <frozen genericpath>:53(getmtime)
4170 0.012 0.000 33.303 0.008 nodes.py:521(INPUT_TYPES)
3304 0.016 0.000 25.226 0.008 nodes.py:604(INPUT_TYPES)
22705 0.045 0.000 7.584 0.000 <frozen genericpath>:39(isdir)
1793074/574 1.903 0.000 5.950 0.010 caching.py:36(to_hashable)
435026/574 0.304 0.000 5.937 0.010 caching.py:44(<listcomp>)
137880/5716 0.287 0.000 4.651 0.001 caching.py:42(<listcomp>)
5125254/3109390 0.874 0.000 3.414 0.000 {built-in method builtins.isinstance}
1007932 0.453 0.000 2.746 0.000 typing.py:1304(__instancecheck__)
1007932 0.612 0.000 2.293 0.000 typing.py:1579(__subclasscheck__)
1007932 0.960 0.000 1.475 0.000 {built-in method builtins.issubclass}
1008346/1007932 0.267 0.000 0.516 0.000 <frozen abc>:121(__subclasscheck__)
1008346/1007932 0.227 0.000 0.248 0.000 {built-in method _abc._abc_subclasscheck}
574 0.001 0.000 0.143 0.000 caching.py:115(get_ordered_ancestry)
18416/574 0.062 0.000 0.143 0.000 caching.py:121(get_ordered_ancestry_internal)
18416 0.009 0.000 0.138 0.000 execution.py:37(get)
150866 0.092 0.000 0.132 0.000 graph_utils.py:1(is_link)
330/80 0.001 0.000 0.129 0.002 utils.py:346(new_func)
60 0.000 0.000 0.119 0.002 IPAdapterPlus.py:647(INPUT_TYPES)
40 0.000 0.000 0.113 0.003 execution.py:135(_map_node_over_list)
40 0.000 0.000 0.112 0.003 execution.py:149(process_inputs)
174788 0.090 0.000 0.090 0.000 {built-in method builtins.sorted}
21 0.000 0.000 0.081 0.004 nodes.py:2962(IS_CHANGED)
28032/1 0.044 0.000 0.080 0.080 copy.py:128(deepcopy)
2868/1 0.009 0.000 0.080 0.080 copy.py:227(_deepcopy_dict)
8 0.000 0.000 0.063 0.008 load.py:102(INPUT_TYPES)
5438/1182 0.008 0.000 0.061 0.000 copy.py:201(_deepcopy_list)
3 0.000 0.000 0.058 0.019 folder_paths.py:203(get_filename_list_)
7 0.001 0.000 0.056 0.008 folder_paths.py:145(recursive_search)
1 0.000 0.000 0.032 0.032 nodes.py:1374(IS_CHANGED)
53/24 0.001 0.000 0.029 0.001 <frozen os>:345(_walk)
18 0.023 0.001 0.023 0.001 {built-in method nt.scandir}
112054 0.022 0.000 0.022 0.000 <frozen _collections_abc>:315(__subclasshook__)
18804 0.021 0.000 0.021 0.000 {built-in method builtins.hasattr}
1 0.019 0.019 0.019 0.019 {method 'read' of '_io.BufferedReader' objects}
136222 0.018 0.000 0.018 0.000 {method 'append' of 'list' objects}
287 0.003 0.000 0.018 0.000 <frozen ntpath>:741(relpath)
8 0.000 0.000 0.017 0.002 nodes_upscale_model.py:18(INPUT_TYPES)
37693 0.016 0.000 0.016 0.000 graph.py:30(has_node)
Recorded by using cProfile and wrap around the problematic lines.
Maybe @guill would be interested in this.
After looking around a bit, looks like LoraLoader and CheckpointLoaderSimple is the culprit since they calls folder_paths.get_filename_list repeatedly during cache key initialization. One simple fix could be caching the output and and dump the cache if user explicitly click Refresh over the web UI. But looks like the cache is there already, guess calls to isdir and getmtime nott cached.
Nice find and thanks for the ping! I'll try to get this fixed by this weekend.
This pull solved this issue for me: https://github.com/comfyanonymous/ComfyUI/pull/4969
This pull solved this issue: #4969
@Trung0246 I have updated to this commit id, but the problem still exists. clean_unused with take 4min+, I directly used the PromptExecutor.execute method and did not start the HTTP request.