idostyle
idostyle
Add support for https://huggingface.co/LifuWang/DistillT5 https://github.com/LifuWang-66/DistillT5 Can be tested with https://huggingface.co/Eviation/DistillT5 e.g. https://huggingface.co/Eviation/DistillT5/blob/main/DistillT5-F32.safetensors which removes the additional wrapping. - [ ] Probably needs to be integrated, and at best automatically detected,...
Not setting up the compute graph twice might result in a minor (/negligible?) performance improvement. Previously it worked like the following in GGMLRunner#compute: 1. calls alloc_compute_buffer 1.1 calls reset_compute_ctx 1.2...