Tianqi Chen
Tianqi Chen
cc @leofang for thoughts, i do think it have usecases when we only want to exchange the data structure while leaving synchronization need explicitly to the user
unfortunately dong so would mean the output won't align with the openai proctol, so likely we cannot support such a case, note that async streaming(between worker and the client) is...
@tvm-bot rerun
This is great, subgroup shuffle can be useful for reduction operations. We did have warp shuffle support for metal backend, so maybe we can try add codegen backend for webgpu
This is a nice PR that would be good to land it. cc @yongwww
thanks @oskar-inceptron If we want to go towards this directly, a better approach is to make Analyzer an Object, so we can use ObjectPtr for this
we recommend start from default options, which we normally use and `q4f16_1` to reduce memory.
https://llm.mlc.ai/docs/compilation/convert_weights.html contain a walk through guide
hi @oglok seems you were using an older version that is now being deprecated
as of now unfortunately we don't have a container file unfortunately so maybe build from source for jetson is needed