harrywu

Results 12 comments of harrywu

@hmellor hello, It seems that the discussion has moved to another place. Can this issue be closed?

@robertgshaw2-neuralmagic I just replicating the new metrics that were added [here](https://github.com/ronensc/vllm/pull/1). should I add some automated tests, or just visually look at the metrics?Thanks!

> Also, looks like you are missing the number of tokens in the batch for each iteration of the LLMEngine you mean `num_generation_tokens_iter`? It already exists.

https://github.com/vllm-project/vllm/issues/3616#issuecomment-2030858781 @robertgshaw2-neuralmagic BTW, Should some metrics be recorded with a periodic bypass process/thread, such as every 1 second.

@robertgshaw2-neuralmagic Thanks for attention! I made some changes to the dashboard. Categorize the charts. But I'm not sure I'm showing some metrics the right way, such as: - request_params_n -...

@robertgshaw2-neuralmagic Hi, will it still be merged before friday?

@robertgshaw2-neuralmagic ping~

@vinhtran2611 I have set `AutoModelForCausalLM.from_pretrained(torch_dtype=torch.bfloat16)` and `_load_model(precision=torch.bfloat16)` But I get `hf_outputs.logits.dtype == torch .float32` and `output.dtype == torch.bfloat16`. Maybe it's the precision problem.

同样的问题,不过我是web协议

@robertgshaw2-neuralmagic Hello~ I see it's been approved for a while, but it is not merged yet. Anything wrong?