SHARK [Llama2] Add fix for generating past key values

[Llama2] Add fix for generating past key values

Open Abhishek-Varma opened this issue 1 year ago • 1 comments

-- torch.tensor on list of np.arrays is VERY SLOW. -- This commit therefore converts the list to a np.array and then uses torch.tensor on the same.

This therefore solves the issue of indefinite hanging after the First Llama is invoked.

Signed-off-by: Abhishek Varma [email protected]

Sep 12 '23 12:09 Abhishek-Varma

@Shukla-Gaurav you can cherry-pick this. This solves the issue of Vulkan execution getting stuck indefinitely. I was able to get the tokens to generate on both CLI and WebUI.

Sep 12 '23 16:09 Abhishek-Varma

SHARK SHARK copied to clipboard

[Llama2] Add fix for generating past key values

SHARK
SHARK copied to clipboard