stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Switch to more precise timers
Describe what this pull request is trying to achieve.
Increase granularity in timer to nanoseconds per standard Python APIs. I believe this will be useful when measuring performance improvements in the future.
Additional notes and description of your changes
This is a new API added in Python as of 3.7 - https://docs.python.org/3/library/time.html#time.perf_counter_ns Since the lowest supported Python version in Linux is 3.7, this should still work on Linux but I don't have an environment to easily confirm.
Environment this was tested in
- OS: Windows, Python 3.10
- Chrome
- NVIDIA RTX 3080 10 GB
Screenshots or videos of your changes
I have CLI output. I initially tested with 6 digits of precision, and ended up deciding on printing only 3 digits:
Model loaded in 11.725878s (calculate hash: 0.000205s, load weights from disk: 0.158098s, find config: 0.000106s, load config: 0.015888s, create model: 0.434900s, apply weights to model: 9.667953s, apply dtype to VAE: 0.008485s, load VAE: 0.000775s, move model to device: 0.001591s, hijack: 0.008093s, load textual inversion embeddings: 1.426266s, scripts callbacks: 0.003518s).
[tag-editor] Settings has been read from config.json
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 93.512177s (import torch: 0.000004s, import gradio: 3.959462s, import ldm: 2.526307s, other imports: 4.611374s, list extensions: 7.230898s, list SD models: 0.967106s, setup codeformer: 0.268106s, setup gfpgan: 0.000078s, list builtin upscalers: 0.019363s, load scripts: 35.301853s, load upscalers: 0.002834s, refresh VAE: 0.005151s, refresh textual inversion templates: 0.000104s, load SD checkpoint: 12.185745s, opts onchange: 0.000084s, reload hypernets: 0.001627s, extra networks: 0.004513s, scripts before_ui_callback: 0.000198s, create ui: 24.035772s, gradio launch: 2.386180s, scripts app_started_callback: 0.005417s).
Model loaded in 1.833s (calculate hash: 0.000s, load weights from disk: 0.031s, find config: 0.000s, load config: 0.011s, create model: 0.395s, apply weights to model: 0.858s, apply dtype to VAE: 0.010s, load VAE: 0.001s, move model to device: 0.001s, hijack: 0.009s, load textual inversion embeddings: 0.513s, scripts callbacks: 0.003s).
[tag-editor] Settings has been read from config.json
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 56.554s (import torch: 0.000s, import gradio: 2.080s, import ldm: 1.072s, other imports: 1.910s, list extensions: 6.430s, list SD models: 0.049s, setup codeformer: 0.238s, setup gfpgan: 0.000s, list builtin upscalers: 0.013s, load scripts: 16.069s, load upscalers: 0.002s, refresh VAE: 0.005s, refresh textual inversion templates: 0.000s, load SD checkpoint: 2.314s, opts onchange: 0.000s, reload hypernets: 0.001s, extra networks: 0.000s, scripts before_ui_callback: 0.000s, create ui: 23.600s, gradio launch: 2.762s, scripts app_started_callback: 0.006s).
jobs that takes less than 0.1 secs are not meant to be displayed so I'm not sure this is needed at all
I did not add any additional instrumentation than what you added. I assume you added them to chase down complaints about slow startup times.
Would you like me to change the code to hide the logging statements for anything that is less than 0.1?
Come on now you should be able to figure this one out. Original code does not display timings shorter than 0.1s. Yours does. The cause is clear as day in there.
The answer is in this line:
additions = [x for x in self.records.items() if x[1] >= 0.1]
Original code uses seconds here, yours is using nanoseconds, which changes the behavior.
I'm closing the PR as it changes the behavior for displaying items that are too fast to have importance. Reopen it if there's anything more.