Alex Redden comments

Results 82 comments of


                                            Alex Redden

Matmul errors out when one tensor is batched and another isn't

Ah, I'll look into it thanks!

LoRA loaded successfully but the effect wasn't applied

I have actually been thinking that I've implemented the qkv loras a little incorrectly, which I do plan to fix, though requires a bit of an overhaul since I have...

The possibility of supporting GPUs with other architectures

Well, fp8 matmul is only possible on ADA devices, since there are cuda instructions for performing matrix multiplication with those tensors. If you don't have an ADA device, then the...

Hot Lora Replacement

I think this would be awesome, I could work on it, though main issue is that I would need to figure whether merging a lora and then unmerging it would...

Hot Lora Replacement

Yeah- that's the problem- you wouldn't want to keep the lora weights in memory- you would want to fuse them into the weights, but if you fuse them into the...

Hot Lora Replacement

So I implemented it but it's not ready for a push- seems to work well though! Includes loading and unloading, and added a web endpoint for it.

Alright I pushed to 'removable-lora' https://github.com/aredden/flux-fp8-api/tree/removable-lora - you can test it if you want- though it's currently not in the webapi, would have to test it via a script @Lantianyou

Hot Lora Replacement

Ah- I guess it might need some work with cleaning up the loras after unloading / unloading. I will work on this, thanks @81549361

Hot Lora Replacement

Alright I merged it into the main branch

Load a LORA using the API

Ah, have you tried the removable-lora branch here? https://github.com/aredden/flux-fp8-api/tree/removable-lora - it has exactly what you're asking. (I believe, unless I havn't pushed the api yet, unsure)