Matt Clayton

Results 8 comments of Matt Clayton

Will this impact custom large runners using Windows Server 2019? Or is it quite literally the `windows-2019` standard runner image only? Example: ![Image](https://github.com/user-attachments/assets/08b18236-3303-4931-8c93-160ebc3657c5)

🙌🙌 Thanks for the quick response @Blaizzy > Or just limitting the size of the image to at most X based on the spec would help? Yes, something to this...

Unfortunately the above still results in the `libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 51619840000 bytes which is greater than the maximum allowed buffer size...

I think my personal takeaways are: 1. Seems like catching this error goes deeper than the code in this repo 2. Maybe there's a further possibility for batched `mx` evaluation...

> Are you making requests in batch? if so what is the use case? Not currently making requests in batch! Sorry, I could have expressed my thoughts around the "batching"...

Certainly! If I add: ``` import faulthandler faulthandler.enable() ``` at the top of `generate.py`, and run: ``` python -m mlx_vlm.generate --image '/Users/matt/Downloads/math-proof.jpg' --temp 0.0 --prompt "what is this" --model "/Users/matt/.cache/lm-studio/models/mlx-community/Qwen2-VL-7B-Instruct-4bit"...

(Please feel free to let me know if you know of better ways to trace)

Seems like there is some relationship between large allocations and `mlx->np` translations, but then once it goes into `mx.async_eval(y)` territory I'm afraid I'm no longer sure how to make modifications...