runhouse
runhouse copied to clipboard
PX (P90) for inference Cold start
Describe the bug Please provide a clear and concise expectation of how cold start looks like. I see the docs mentions couple of methods ot speed up the load time for models, it would be great if objective numbers could be added. Ray also provides methods to combat cold start, and I see the library is being utilized, but do you use such methods?
For example if you look the img below from this article, most providers of the cold starts are below 100s. (see img) & most providers list either P90/P70/P50 values to help understand the cold start problem & solutions in those terms.
Other relevant stuff: https://news.ycombinator.com/item?id=35738072 https://www.banana.dev/blog/turboboot