gregory-fanous

Results 2 comments of gregory-fanous

Huggingface instances are absurdly expensive, and VLLM on macOS is unbearably slow as it doesn't utilize MPS. Is there not a way to use HF transformers to load the model?...

Were you able to get past this? Were you able to use the model on MPS or CPU only? VLLM is only supporting CPU usage and it's unusable for that...