ggml
ggml copied to clipboard
Garbage output on Metal on x86-64 mac
Hi, I get garbage when I run gpt2 with metal.
Here are the steps I took:
cmake -DGGML_METAL=ON -DBUILD_SHARED_LIBS=Off ..
make -j gpt-2-batched
./bin/gpt-2-batched -m models/gpt-2-117M/ggml-model.bin -p "This is an example" -ngl 1 -s 1703042754
Output:
main: seed = 1703042754
gpt2_model_load: loading model from 'models/gpt-2-117M/ggml-model.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx = 1024
gpt2_model_load: n_embd = 768
gpt2_model_load: n_head = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: ftype = 1
gpt2_model_load: qntvr = 0
gpt2_model_load: ggml tensor size = 384 bytes
gpt2_model_load: backend buffer size = 312.72 MB
gpt2_model_load: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: picking default device: Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/<redacted>/ggml/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name: Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 1610.61 MB
ggml_metal_init: maxTransferRate = built-in GPU
gpt2_model_load: memory size = 144.00 MB, n_mem = 24576
gpt2_model_load: model size = 239.08 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: compute buffer size: 6.46 MB
main: prompt: 'This is an example'
main: number of tokens in prompt = 4, first 8 tokens: 1212 318 281 1672
and related)],, assignment 2013][ ] 2011 assignment]. ] ][nyder ][]] ]RANTterRANTDCter]:RANTode Postedode ].hell"]hell ]hellSBwoodeodeaskingmarthell ]]:batwowoodehell"] ][odeaskaskingCmdhellode].],],ode],woodewoTF ],woaskingRMwowo ][]TFodeode Terwo ]ray ];ification"]woodeter ]RANT ]; ][].RMRMwowotask]:RM ]]. ];wowohellode][].odebat ]bat ], RomneywoRANT ];martray>>>> hellode ][RM][].":-odeodeodewohellCmdtaskwoode']wotaskwoRMRM wohell RMRMasksRMtaskhell] Posted ];rayRModeaskingaskingraytaskwo ]:wowo":-":-taskaskaskitywo]}raytask ] ][ ]] Posted Posted GTateral gd ][
main: n_decoded = 199
main: load time = 343.22 ms
main: sample time = 83.01 ms
main: predict time = 3197.49 ms
main: total time = 3732.56 ms
ggml_metal_free: deallocating
CPU inference works fine.
Was just playing with ggml and thought that opening this issue made sense. Nothing important. Thank you!