Nanoflow The output is wrong when using serve.py.

I flow the instruction using WeightSaver.py to convert a meta-llama/Llama2-70B-base model. And then I use gen_req.py to produce test dataset. python3 gen_req.py "The University of Washington is located" 100 0 trace.csv The original model paths in the code repository were all set to "meta-llama/Llama2-70B-chat". I have changed them to the paths of the Llama2-70b models that I have downloaded locally. I use serve.py. python3 server.py --trace_path trace.csv

But the output file trace.csv.out is weird:

The University of Washington is located...,,,,......................................................,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, The University of Washington is located rom the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the the The University of Washington is located, profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile profile and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and The University of Washington is located = the the the the the the the the the the the the the the the the the the the the the the the the the the the the The University of Washington is located'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' The University of Washington is located = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = The University of Washington is locatedissfttytyftfttytytytytytyty height height height height height height height height height height height height height height height height height height height height height height height height height height height height extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra extra The University of Washington is located the’ Mu’’’’’’’’’’ Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu Mu’’’’’’’’’’’..’’’...........................//...//// The University of Washington is located synth organ organ organ organ organ,,,,,,,,,,

Aug 28 '24 09:08 alexngng

GPU:4xA100-80G torch 2.4.0+cu121

Aug 28 '24 09:08 alexngng

Thanks for your question. Nanoflow works on 8*A100 only for the current version. When less than 8 cards are presented, Nanoflow assumes empty result for the missing GPUs, causing incorrect output.

Aug 28 '24 17:08 serendipity-zk

Thanks for your question. Nanoflow works on 8*A100 only for the current version. When less than 8 cards are presented, Nanoflow assumes empty result for the missing GPUs, causing incorrect output.

Thanks for your reply! I will test it on 8xA100.

Aug 29 '24 01:08 alexngng

Does it work on 8x other GPUs, such as 4090s? Or are only A100 supported?

Aug 29 '24 02:08 aikitoria

Thanks for your question. Nanoflow works on 8*A100 only for the current version. When less than 8 cards are presented, Nanoflow assumes empty result for the missing GPUs, causing incorrect output.

Will fewer Gpus be supported?

Aug 29 '24 14:08 CSEEduanyu

4090s do not have Nvlinks to efficiently move data between GPUs. Therefore, the pipeline needs to be re-designed to accommodate long communication time. We will work on supporting Nanoflow with fewer GPUs. However, fewer GPUs would decrease the batch size of the request and cannot reach the same throughput.

Aug 29 '24 17:08 serendipity-zk