Tianqi Chen

Results 637 comments of Tianqi Chen

cc @leofang for thoughts, i do think it have usecases when we only want to exchange the data structure while leaving synchronization need explicitly to the user

unfortunately dong so would mean the output won't align with the openai proctol, so likely we cannot support such a case, note that async streaming(between worker and the client) is...

This is great, subgroup shuffle can be useful for reduction operations. We did have warp shuffle support for metal backend, so maybe we can try add codegen backend for webgpu

This is a nice PR that would be good to land it. cc @yongwww

thanks @oskar-inceptron If we want to go towards this directly, a better approach is to make Analyzer an Object, so we can use ObjectPtr for this

we recommend start from default options, which we normally use and `q4f16_1` to reduce memory.

https://llm.mlc.ai/docs/compilation/convert_weights.html contain a walk through guide

hi @oglok seems you were using an older version that is now being deprecated

as of now unfortunately we don't have a container file unfortunately so maybe build from source for jetson is needed