SHARK
SHARK copied to clipboard
[Llama2] Add fix for generating past key values
-- torch.tensor on list of np.arrays is VERY SLOW. -- This commit therefore converts the list to a np.array and then uses torch.tensor on the same.
This therefore solves the issue of indefinite hanging after the First Llama is invoked.
Signed-off-by: Abhishek Varma [email protected]
@Shukla-Gaurav you can cherry-pick this. This solves the issue of Vulkan execution getting stuck indefinitely. I was able to get the tokens to generate on both CLI and WebUI.