executorch
executorch copied to clipboard
Error when running inference for nanoGPT LLM example
Hi,
I am following the instructions from https://github.com/pytorch/executorch/blob/main/docs/source/llm/getting-started.md and got to the "Building and Running" section where you compile the CPP code and try to do inference. However, when I enter a prompt, I get this error. I also tried other prompts but get the same result.
I am on an Apple M1 Pro, with MacOS 13.6.6
Enter model prompt: Hello world!
E 00:00:33.096084 executorch:op_split_with_sizes_copy.cpp:60] Check failed (tensor_is_broadcastable_to( {target_out_sizes, target_out_ndim}, out[i].sizes())):
E 00:00:33.096114 executorch:method.cpp:1034] KernelCall failed at instruction 0:11 in operator aten::split_with_sizes_copy.out: 0x12
E 00:00:33.096130 executorch:method.cpp:1040] arg 0 with type id 1
E 00:00:33.096132 executorch:method.cpp:1040] arg 1 with type id 8
E 00:00:33.096133 executorch:method.cpp:1040] arg 2 with type id 4
E 00:00:33.096134 executorch:method.cpp:1040] arg 3 with type id 9
E 00:00:33.096135 executorch:method.cpp:1040] arg 4 with type id 9
F 00:00:33.096137 executorch:result.h:165] In function CheckOk(), assert failed: hasValue_
Hello world!zsh: abort ./cmake-out/nanogpt_runner
If you need any more detail from me that would help to make this reproducible, just let me know. Thanks!
That seems unexpected, let us take a look
Thanks for reporting this @bryangarza. Could you also let us know which executorch git commit hash you checked out when running this?
Thanks @bryangarza for sharing this! This should be an operator bug we've fixed before. Can you share me your git hash or pull the latest 0.2 branch and try again?
I was on 6a1703eb2345e3508d6b59e690c170ac3e02b7a7 which I got from
git clone -b release/0.2 https://github.com/pytorch/executorch.git third-party/executorch
https://github.com/pytorch/executorch/commits/release/0.2/
@Gasoonjia which 0.2 branch do you mean?
Yes release/0.2 is exactly the 0.2 branch I mentioned.
@bryangarza Can you please help me to rebuild ET in a brand new conda env follow our tutorial https://pytorch.org/executorch/0.2/llm/getting-started.html#prerequisites and try again?
The error you encountered should be something we've fixed. If you have ever downloaded any other ET before, maybe your environment is still using that, instead of the latest one you downloaded, which still has that bug.
Just tried the tutorial from scratch, new conda env and everything, but still getting the same error on release/0.2. Could you share which commit hash or PR addressed the error? Want to see if I can debug on my side.
This is my original PR https://github.com/pytorch/executorch/pull/3175.
I just went through my PR, looks like there might be some issues when merging that PR to release/0.2 branch. Can you please help me to use the main branch instead to see how things go?
Right now, I can confirm that my PR has been merged into release/0.2 successfully. Please repull the branch and try again. It should work right now.
Thanks @Gasoonjia! I tried it again by pulling release/0.2 and rebuilding. It's working for me now.