vik
vik
I've never tried Llama 70B, but this is running in fp16 without any quantization. That might be part of it?
This will be coming in the next release, around Aug 19. On Sat, Aug 10, 2024 at 02:17 Sunnyburnwal01123 ***@***.***> wrote: > Can I fine tune the model with the...
Updated here for compatibility with the latest version of transformers: https://github.com/vikhyat/moondream/commit/22565c070cc1bcbfca5a2f758d3e120b882a6e4b Haven't pushed to HF yet - will do next week.
The change we made to support higher resolution images hasn't been ported to llama.cpp/ollama yet - https://github.com/vikhyat/moondream/commit/ffbf8228aca7138fb55cee2119237d433f8431e2
Haven't seen it before, looks like it's coming from the transformers library. Can you share the image/prompt so I can try to reproduce?
FYI we're also very close to shipping llama.cpp based inference code that will run a lot faster on CPU than the PyTorch implementation. Development on that is going on in...
Are you referring to the demo on the Hugging Face space? (asking because we have a few different demos)
Looks like this was fixed here: https://github.com/huggingface/transformers/pull/31695
I don’t think HF transformers supports Flash Attention 1.0, so you would have to edit the attention classes in the model definition.
Hard to comment in general, it depends on the fine tuning task, dataset size, hyperparameters used etc. Are you able to share any additional information about the fine tuning?