Enable strict mode in configure_mlx.sh

Open mariano opened this issue 10 months ago • 0 comments

Adding strict mode.

Also @AlexCheema didn't want to create an issue to comment this, so placing this here. On my POV benchmark (M4 max 128gb ram), running transformer pipeline on meta-llama-3-8b-Instruct, with 15 input tokens, stopping inference at 101 tokens, I'm seeing this (ttft: time to fist token, ts: tokens per second)

baseline:

run 1: ttft 0.32s, ts 20.30 t/s
run 2: ttft 0.31s, ts 20.43 t/s
run 3: ttft 0.28s, ts 20.56 t/s
run 4: ttft 0.28s, ts 20.54 t/s
run 5: ttft 0.28s, ts 20.58 t/s

with this script:

run 1: ttft 0.19s, ts 20.86 t/s
run 2: ttft 0.18s, ts 20.80 t/s
run 3: ttft 0.18s, ts 20.92 t/s
run 4: ttft 0.18s, ts 20.88 t/s
run 5: ttft 0.17s, ts 20.87 t/s

that's pretty cool. Thanks for this script :)

Feb 19 '25 22:02 mariano