Wenxuan Huang
Results
2
issues of
Wenxuan Huang
Hello, I would like to know if the inference times reported in Figure 4 are measured under NO KV cache? While the "TPS" results in Table 3 are prefill time...
Great work! I would like to know whether this framework supports multimodal input from agents. For example, could it handle image and text responses from agents (perhaps similar to OpenAI...
topic/multimodality