Wenxuan Huang

Results 2 issues of Wenxuan Huang

Hello, I would like to know if the inference times reported in Figure 4 are measured under NO KV cache? While the "TPS" results in Table 3 are prefill time...

Great work! I would like to know whether this framework supports multimodal input from agents. For example, could it handle image and text responses from agents (perhaps similar to OpenAI...

topic/multimodality