Sean Young
Sean Young
I think I've run the profiler. At least I followed the documentation [here](https://docs.vllm.ai/en/latest/dev/profiling/profiling_index.html) and found the `start_profile` and `stop_profile` routes. Unffortunately I haven't been able to view the logs. I've...
That worked, thanks. You've probably already seen this then, but in case it's useful to anyone else:  Staggered / Baseline:  Async / Experimental: 
Assuming I've done the right thing using `VLLM_TRACE_FUNCTION=1`, the stack traces have been [uploaded](https://drive.google.com/drive/folders/1QIsp59UqV9Syjt4uVkCznHDi6_8RW1GW?usp=sharing) as well, though I'm afraid I have no idea how to pull anything useful from them....
Had a go using [py-spy](https://github.com/benfred/py-spy), think that might be more what you were after. Uploaded the results with the [rest](https://drive.google.com/drive/folders/1QIsp59UqV9Syjt4uVkCznHDi6_8RW1GW?usp=sharing). The actual svg files are interactive so they provide a...
@DarkLight1337 Using the same model and text only input I can't get it to do this. The async and staggered versions both appear to have similar memory usage. I'm also...
> For "3 manually started synchronous instances", did you launch 3 endpoints and adjust the GPU memory utilization to about 30% per instance, or you have 3 GPUs? No, sorry...
Yeah you're right, in the end there's not much difference between the 3 sync, staggered async and async. As far as the server is concerned it still ends up running...
> So your "sync" is not really "sync"...it's really confusing. Sorry about that, I was talking in terms of the individual scripts. > Then what I can think of in...
Unffortunately that doesn't work. I came across that issue and tried it, but there's no difference in the html returned. ```python print("\nmarkdown_html:") html = markdown.markdown(loose_list, extensions=["tables"]) print(html) print("\nmarkdown_it_html:") md =...