Alex Redden
Alex Redden
Ah, I'll look into it thanks!
I have actually been thinking that I've implemented the qkv loras a little incorrectly, which I do plan to fix, though requires a bit of an overhaul since I have...
Well, fp8 matmul is only possible on ADA devices, since there are cuda instructions for performing matrix multiplication with those tensors. If you don't have an ADA device, then the...
I think this would be awesome, I could work on it, though main issue is that I would need to figure whether merging a lora and then unmerging it would...
Yeah- that's the problem- you wouldn't want to keep the lora weights in memory- you would want to fuse them into the weights, but if you fuse them into the...
So I implemented it but it's not ready for a push- seems to work well though! Includes loading and unloading, and added a web endpoint for it.
Alright I pushed to 'removable-lora' https://github.com/aredden/flux-fp8-api/tree/removable-lora - you can test it if you want- though it's currently not in the webapi, would have to test it via a script @Lantianyou
Ah- I guess it might need some work with cleaning up the loras after unloading / unloading. I will work on this, thanks @81549361
Alright I merged it into the main branch
Ah, have you tried the removable-lora branch here? https://github.com/aredden/flux-fp8-api/tree/removable-lora - it has exactly what you're asking. (I believe, unless I havn't pushed the api yet, unsure)