Alex Redden

Results 82 comments of Alex Redden

I have actually been thinking that I've implemented the qkv loras a little incorrectly, which I do plan to fix, though requires a bit of an overhaul since I have...

Well, fp8 matmul is only possible on ADA devices, since there are cuda instructions for performing matrix multiplication with those tensors. If you don't have an ADA device, then the...

I think this would be awesome, I could work on it, though main issue is that I would need to figure whether merging a lora and then unmerging it would...

Yeah- that's the problem- you wouldn't want to keep the lora weights in memory- you would want to fuse them into the weights, but if you fuse them into the...

So I implemented it but it's not ready for a push- seems to work well though! Includes loading and unloading, and added a web endpoint for it.

Alright I pushed to 'removable-lora' https://github.com/aredden/flux-fp8-api/tree/removable-lora - you can test it if you want- though it's currently not in the webapi, would have to test it via a script @Lantianyou

Ah- I guess it might need some work with cleaning up the loras after unloading / unloading. I will work on this, thanks @81549361

Alright I merged it into the main branch

Ah, have you tried the removable-lora branch here? https://github.com/aredden/flux-fp8-api/tree/removable-lora - it has exactly what you're asking. (I believe, unless I havn't pushed the api yet, unsure)