Ziren Wang comments

Results 11 comments of


                                            Ziren Wang

How to run on A100 40G?

We have upgraded our codebase from c++ to python, now the configuration would be more clear so that you can clearly define the KV cache capacity in your own code....

Regarding GEMV.AG and O.AG

Here batchsize means the token number per round, which is not the number of request.

To run in multi-gpu, we have built an engine here [Multi-GPU engine script](https://github.com/efeslab/Nanoflow/blob/Nanoflow-python/entry/test_multi_gpu3.py), which would be helpful for you to build your own multi-gpu model.

Issues with running serve.sh

We have changed our codebase from c++ to python, now it shoule be easier to launch the program.

code for automated parametric search

Here it is, which is in the branch Nanoflow-python, auto_search/new_search.py in the codebase. For this New version of auto-search, there are two stages of auto search. The first is that...

ModuleNotFoundError: No module named 'pllm_python' when serve_8B.py executed

We have changed our codebase from c++ to python, which would be more clear to debug. Moreover, we have deleted pllm_python module in the python codebase since it's useful in...

CUDA Error: misaligned address

Is this fixed now? Maybe you can try the lastest version of our codebase now, it shoule be work.

what is the meaning of "M" in file new_search.py

M is just a parameter for ensuring the correctness of sequential execution in one stream, which is called "big-M formulations". So M should be large enough in this case, it...

I tried to run Qwen1.5-0.5B-Chat-m model with NanoFLow,but I'm not sure how to adjust "model_configs"

May you try again in our codebase now? Since we have upgraded our codebase from c++ to python so that it shoule be easier to implement a new model in...

FileNotFoundError when running run_llama3.py: missing ../auto_search/8B_search_result_large_btz.json

You could try test_correctness first. Here test_performance() is designed for testing the performance of our auto search result. For auto search, you need to do profile first, and then run...