polars
polars copied to clipboard
Implement POLARS_MAX_MEMORY_MIB
Fixes #9892
The code seem to work in that it overrides the memory reported by sysinfo, but I'm unable to have Polars actually comply with the limits I'm setting.
Thanks for taking the initiative on this one.
I'm unable to have Polars actually comply with the limits I'm setting
That means this is not ready to be merged, right? I'm putting this on draft for now. If you're looking for help on this, please tag a code owner.
I'm also thinking this should probably be available as an option in the Config
object in Python, though I'm not 100% sure on that.
please tag a code owner.
OK, tagging @ritchie46 @orlp!
Any progress on this? Would greatly appreciate this (been running into memory limits running on an academic SLURM cluster)! Happy to test if/when that would be helpful.
It's not working and I don't know why. Please feel free to give it a try and help understanding why it's not working.
Yes, this feature would be great. In my case I have multiple processes running polars operations at the same time on the same server, and when the steps "sync" the memory usage is over 9000. It also makes it hard to test the speed-vs-resources tradeoff (e.g. could I be running the same code on a smaller instance?)
Highly interested in this. Google Cloud Run limits memory to 32Gb, and we have bigger datasets than will fit in that memory that we want to "push through and clean up columns on". Could this help with something like setting a "known upper limit" for MEM usage by Polars, so it doesnt crash the containers on OOM?
Does Google Cloud Run properly set the limits via cgroups?
Then memory reporting should work fine with recent versions of Polars (I implemented cgroups support in the rs-sysinfo crate that's used by Polars)
I'll close this as there has been no progress here in a while. Please open a new PR if you have something working!
@stinodego was their something that should still be done on OPs side? I think we still had to get to this one. Evwn though we cannot get to it soon enough there might be something of value in it. I haven't had the time to look yet.
We definitely still want this functionality, but @jonashaag mentioned this code wasn't working yet but didn't know why.
Since there wasn't any progress here in a while I figured I'd close it, maybe someone else could pick it up with a fresh implementation. But if you can help get this code working then that's even better! Feel free to re-open in that case.
Ok, then I understand. :)
What is the best way to track this effort? Having some sort of memory limit feels like a must.
What is the best way to track this effort? Having some sort of memory limit feels like a must.
You can subscribe to the linked issue.
The best way is to implement this functionality and open a PR :)
Any chance that #15798 helps with this? Looks like polars' version of sysinfo has been bumped, so while the issue surely isn't the same the sysinfo bump may have sorted out the problem. This is fishing because I don't have the knowledge to work on this in a reasonable timeframe, but if it could get working that would be great.