executorch icon indicating copy to clipboard operation
executorch copied to clipboard

Error when running inference for nanoGPT LLM example

Open bryangarza opened this issue 3 months ago • 8 comments

Hi,

I am following the instructions from https://github.com/pytorch/executorch/blob/main/docs/source/llm/getting-started.md and got to the "Building and Running" section where you compile the CPP code and try to do inference. However, when I enter a prompt, I get this error. I also tried other prompts but get the same result.

I am on an Apple M1 Pro, with MacOS 13.6.6

Enter model prompt: Hello world!
E 00:00:33.096084 executorch:op_split_with_sizes_copy.cpp:60] Check failed (tensor_is_broadcastable_to( {target_out_sizes, target_out_ndim}, out[i].sizes())):
E 00:00:33.096114 executorch:method.cpp:1034] KernelCall failed at instruction 0:11 in operator aten::split_with_sizes_copy.out: 0x12
E 00:00:33.096130 executorch:method.cpp:1040] arg 0 with type id 1
E 00:00:33.096132 executorch:method.cpp:1040] arg 1 with type id 8
E 00:00:33.096133 executorch:method.cpp:1040] arg 2 with type id 4
E 00:00:33.096134 executorch:method.cpp:1040] arg 3 with type id 9
E 00:00:33.096135 executorch:method.cpp:1040] arg 4 with type id 9
F 00:00:33.096137 executorch:result.h:165] In function CheckOk(), assert failed: hasValue_
Hello world!zsh: abort      ./cmake-out/nanogpt_runner

If you need any more detail from me that would help to make this reproducible, just let me know. Thanks!

bryangarza avatar May 01 '24 22:05 bryangarza

That seems unexpected, let us take a look

Jack-Khuu avatar May 02 '24 16:05 Jack-Khuu

Thanks for reporting this @bryangarza. Could you also let us know which executorch git commit hash you checked out when running this?

dbort avatar May 02 '24 18:05 dbort

Thanks @bryangarza for sharing this! This should be an operator bug we've fixed before. Can you share me your git hash or pull the latest 0.2 branch and try again?

Gasoonjia avatar May 02 '24 18:05 Gasoonjia

I was on 6a1703eb2345e3508d6b59e690c170ac3e02b7a7 which I got from

git clone -b release/0.2 https://github.com/pytorch/executorch.git third-party/executorch

https://github.com/pytorch/executorch/commits/release/0.2/

@Gasoonjia which 0.2 branch do you mean?

bryangarza avatar May 02 '24 19:05 bryangarza

Yes release/0.2 is exactly the 0.2 branch I mentioned. @bryangarza Can you please help me to rebuild ET in a brand new conda env follow our tutorial https://pytorch.org/executorch/0.2/llm/getting-started.html#prerequisites and try again?

The error you encountered should be something we've fixed. If you have ever downloaded any other ET before, maybe your environment is still using that, instead of the latest one you downloaded, which still has that bug.

Gasoonjia avatar May 02 '24 20:05 Gasoonjia

Just tried the tutorial from scratch, new conda env and everything, but still getting the same error on release/0.2. Could you share which commit hash or PR addressed the error? Want to see if I can debug on my side.

bryangarza avatar May 03 '24 00:05 bryangarza

This is my original PR https://github.com/pytorch/executorch/pull/3175.

I just went through my PR, looks like there might be some issues when merging that PR to release/0.2 branch. Can you please help me to use the main branch instead to see how things go?

Gasoonjia avatar May 03 '24 01:05 Gasoonjia

Right now, I can confirm that my PR has been merged into release/0.2 successfully. Please repull the branch and try again. It should work right now.

Gasoonjia avatar May 03 '24 19:05 Gasoonjia