stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Switch to more precise timers

Open robertsmieja opened this issue 1 year ago • 3 comments

Describe what this pull request is trying to achieve.

Increase granularity in timer to nanoseconds per standard Python APIs. I believe this will be useful when measuring performance improvements in the future.

Additional notes and description of your changes

This is a new API added in Python as of 3.7 - https://docs.python.org/3/library/time.html#time.perf_counter_ns Since the lowest supported Python version in Linux is 3.7, this should still work on Linux but I don't have an environment to easily confirm.

Environment this was tested in

  • OS: Windows, Python 3.10
  • Chrome
  • NVIDIA RTX 3080 10 GB

Screenshots or videos of your changes

I have CLI output. I initially tested with 6 digits of precision, and ended up deciding on printing only 3 digits:

Model loaded in 11.725878s (calculate hash: 0.000205s, load weights from disk: 0.158098s, find config: 0.000106s, load config: 0.015888s, create model: 0.434900s, apply weights to model: 9.667953s, apply dtype to VAE: 0.008485s, load VAE: 0.000775s, move model to device: 0.001591s, hijack: 0.008093s, load textual inversion embeddings: 1.426266s, scripts callbacks: 0.003518s).
[tag-editor] Settings has been read from config.json
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 93.512177s (import torch: 0.000004s, import gradio: 3.959462s, import ldm: 2.526307s, other imports: 4.611374s, list extensions: 7.230898s, list SD models: 0.967106s, setup codeformer: 0.268106s, setup gfpgan: 0.000078s, list builtin upscalers: 0.019363s, load scripts: 35.301853s, load upscalers: 0.002834s, refresh VAE: 0.005151s, refresh textual inversion templates: 0.000104s, load SD checkpoint: 12.185745s, opts onchange: 0.000084s, reload hypernets: 0.001627s, extra networks: 0.004513s, scripts before_ui_callback: 0.000198s, create ui: 24.035772s, gradio launch: 2.386180s, scripts app_started_callback: 0.005417s).
Model loaded in 1.833s (calculate hash: 0.000s, load weights from disk: 0.031s, find config: 0.000s, load config: 0.011s, create model: 0.395s, apply weights to model: 0.858s, apply dtype to VAE: 0.010s, load VAE: 0.001s, move model to device: 0.001s, hijack: 0.009s, load textual inversion embeddings: 0.513s, scripts callbacks: 0.003s).
[tag-editor] Settings has been read from config.json
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 56.554s (import torch: 0.000s, import gradio: 2.080s, import ldm: 1.072s, other imports: 1.910s, list extensions: 6.430s, list SD models: 0.049s, setup codeformer: 0.238s, setup gfpgan: 0.000s, list builtin upscalers: 0.013s, load scripts: 16.069s, load upscalers: 0.002s, refresh VAE: 0.005s, refresh textual inversion templates: 0.000s, load SD checkpoint: 2.314s, opts onchange: 0.000s, reload hypernets: 0.001s, extra networks: 0.000s, scripts before_ui_callback: 0.000s, create ui: 23.600s, gradio launch: 2.762s, scripts app_started_callback: 0.006s).

robertsmieja avatar Mar 20 '23 05:03 robertsmieja

jobs that takes less than 0.1 secs are not meant to be displayed so I'm not sure this is needed at all

AUTOMATIC1111 avatar Mar 25 '23 09:03 AUTOMATIC1111

I did not add any additional instrumentation than what you added. I assume you added them to chase down complaints about slow startup times.

Would you like me to change the code to hide the logging statements for anything that is less than 0.1?

robertsmieja avatar Mar 25 '23 21:03 robertsmieja

Come on now you should be able to figure this one out. Original code does not display timings shorter than 0.1s. Yours does. The cause is clear as day in there.

AUTOMATIC1111 avatar Mar 26 '23 09:03 AUTOMATIC1111

The answer is in this line:

additions = [x for x in self.records.items() if x[1] >= 0.1]

Original code uses seconds here, yours is using nanoseconds, which changes the behavior.

AUTOMATIC1111 avatar Mar 27 '23 03:03 AUTOMATIC1111

I'm closing the PR as it changes the behavior for displaying items that are too fast to have importance. Reopen it if there's anything more.

AUTOMATIC1111 avatar Apr 29 '23 14:04 AUTOMATIC1111