Sean Young

Results 9 comments of Sean Young

I think I've run the profiler. At least I followed the documentation [here](https://docs.vllm.ai/en/latest/dev/profiling/profiling_index.html) and found the `start_profile` and `stop_profile` routes. Unffortunately I haven't been able to view the logs. I've...

That worked, thanks. You've probably already seen this then, but in case it's useful to anyone else: ![image](https://github.com/user-attachments/assets/456a72af-eb9f-4abb-82d4-7945daefb801) Staggered / Baseline: ![image](https://github.com/user-attachments/assets/cd835c5e-9eca-4b70-ad3d-34447950ed66) Async / Experimental: ![image](https://github.com/user-attachments/assets/2b644bd2-8b4a-44fc-a165-4ea5f22ee980)

Assuming I've done the right thing using `VLLM_TRACE_FUNCTION=1`, the stack traces have been [uploaded](https://drive.google.com/drive/folders/1QIsp59UqV9Syjt4uVkCznHDi6_8RW1GW?usp=sharing) as well, though I'm afraid I have no idea how to pull anything useful from them....

Had a go using [py-spy](https://github.com/benfred/py-spy), think that might be more what you were after. Uploaded the results with the [rest](https://drive.google.com/drive/folders/1QIsp59UqV9Syjt4uVkCznHDi6_8RW1GW?usp=sharing). The actual svg files are interactive so they provide a...

@DarkLight1337 Using the same model and text only input I can't get it to do this. The async and staggered versions both appear to have similar memory usage. I'm also...

> For "3 manually started synchronous instances", did you launch 3 endpoints and adjust the GPU memory utilization to about 30% per instance, or you have 3 GPUs? No, sorry...

Yeah you're right, in the end there's not much difference between the 3 sync, staggered async and async. As far as the server is concerned it still ends up running...

> So your "sync" is not really "sync"...it's really confusing. Sorry about that, I was talking in terms of the individual scripts. > Then what I can think of in...

Unffortunately that doesn't work. I came across that issue and tried it, but there's no difference in the html returned. ```python print("\nmarkdown_html:") html = markdown.markdown(loose_list, extensions=["tables"]) print(html) print("\nmarkdown_it_html:") md =...