Increment RAM used while swap flux model
Hello!
I have discovered an unpleasant feature of working with the Flux model. Every time I change the model, the RAM consumption constantly increases, it starts using swap and eventually everything starts to generate for a long time, and then everything crashes. I have a GTX 1070 8 Gb + 64 Gb RAM, when changing the model it gets to the point that all the RAM and 70 Gb of swap are used.
Is there any way to fix this so as not to restart everything? As I understand it, for some reason the memory is not being freed up, perhaps it considers swap as part of the memory, but in theory this should not happen with those models that are stored in memory.
In addition, in my settings it is set to save only one model in memory, that is, this is most likely an error, and not the correct behavior.
Couple questions:
- Which flux quantization are you using (fp8, gguf Q8, Q5, etc)
- What's your GPU weights setting set to? You might want to lower it by ~1GB to compensate for the VRAM that the OS uses