seungrokj
seungrokj
@amathews-amd @shajrawi @andyluo7 @mawong-amd @jeffdaily @liligwu @hongxiayang Please take a look at these when you're available
@fxmarty Can you please add one more triton.Config in https://github.com/huggingface/text-generation-inference/blob/b7e98ba635367daa23c5b1f4a73f51b1f061936a/server/text_generation_server/utils/flash_attn_triton.py#L261 ``` triton.Config( { "BLOCK_M": 128, "BLOCK_N": 64, "waves_per_eu": 1, "PRE_LOAD_V": False, }, num_stages=1, num_warps=4, ), ``` This will improve the...
Hi @james-banks In the python code snippet, the module name should be "_" and "-" is not allowed. ``` Python 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0] :: Anaconda,...
Hi @zhyncs thank you for quick rely! Can you elaborate a little bit differences btw bench_one_bench and launch_server + bench_serving ? Does bench_one_bench process batch*intput_tokens differently from online serving ?
In this screen shot.. it's better to put mi300 information here, otherwise people will keep asking whether the rocm/pytorch:latest-release supports gfx942 or not. 