[Langchain-Chatchat]Add time consumption msg about first token and rest tokens
The current model is unable to calculate the time spent on first token and rest tokens, can we add this msg ?
Hi @johnysh,
Currently, we have not natively supported time consumption msg about first token and rest tokens latency in log for Langchain-Chatchat. However, you could do that with the help of ipex-llm benchmark tool.
To use benchmark tool in Langchain-Chatchat:
-
Put
benchmark_util.pyin your conda env for langchain-chatchat:The path to put the script should be like (taking linux os as an example):
/home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/benchmark_util.py -
In
/home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py, addBenchmarkWrapperto model:That is, change the code here to:
self.model, self.tokenizer = load_model( model_path, device, self.load_in_low_bit, trust_remote_code ) from .benchmark_util import BenchmarkWrapper self.model = BenchmarkWrapper(self.model) -
In
/home/<user_name>/<anaconda3 or miniconda3>/envs/<your conda env name>/lib/python3.11/site-packages/ipex_llm/serving/fastchat/ipex_llm_worker.py, add print message for 1st and rest token latency.That is, change the code here to:
print(f"First token latency (s): {self.model.first_cost}", flush=True) print(f"Rest token latency (s): {self.model.rest_cost_mean}", flush=True) yield json.dumps(json_output).encode() + b"\0"
Please let us know for any further problems :)