Ling Jing
Ling Jing
@Renbry It seems to be related to the c++ environment? Maybe you can refer to the first and second answers here: https://stackoverflow.com/questions/71470989/python-setup-py-bdist-wheel-did-not-run-successfully
> unfortunately that doesn't work for me either.. it's failing when i `pip install ./extensions/cuda` > > `Processing c:\apps\zipnerf-pytorch\extensions\cuda Preparing metadata (setup.py) ... error error: subprocess-exited-with-error > > × python...
Hi, the PR of another repo has been merged. And I have updated the link in the document:).
> @Jing1Ling can you check the other PR #1126, the code base has changed there @yafshar Thank you for the reminder. I believe this patch is a more concise approach....
**Why this change causes such a large decrease in memory usage**: The conclusion is that 'query_states', 'key_states', and 'value_states' are all views of 'mixed_qkv'. When I create a new tensor...
Updated to be compatible with the main branch.
The issue of increased memory usage was resolved by commenting the code that converts past_key_values from a list to a tuple. In addition, I set kv_cache_len as an attribute of...
Hi @schoi-habana, I have updated this PR and reply to your comments. Now it has better throughput and lower memory usage without introducing too many changes. Could you review it...
Hi @mandy-li , @regisss , @libinta , @ssarkar2 , @bhargaveede , @vivekgoe , could you please help review the updated code? It has been waiting for two weeks.