Matt Clayton
Matt Clayton
Will this impact custom large runners using Windows Server 2019? Or is it quite literally the `windows-2019` standard runner image only? Example: 
🙌🙌 Thanks for the quick response @Blaizzy > Or just limitting the size of the image to at most X based on the spec would help? Yes, something to this...
Unfortunately the above still results in the `libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 51619840000 bytes which is greater than the maximum allowed buffer size...
I think my personal takeaways are: 1. Seems like catching this error goes deeper than the code in this repo 2. Maybe there's a further possibility for batched `mx` evaluation...
> Are you making requests in batch? if so what is the use case? Not currently making requests in batch! Sorry, I could have expressed my thoughts around the "batching"...
Certainly! If I add: ``` import faulthandler faulthandler.enable() ``` at the top of `generate.py`, and run: ``` python -m mlx_vlm.generate --image '/Users/matt/Downloads/math-proof.jpg' --temp 0.0 --prompt "what is this" --model "/Users/matt/.cache/lm-studio/models/mlx-community/Qwen2-VL-7B-Instruct-4bit"...
(Please feel free to let me know if you know of better ways to trace)
Seems like there is some relationship between large allocations and `mlx->np` translations, but then once it goes into `mx.async_eval(y)` territory I'm afraid I'm no longer sure how to make modifications...