pandas-formats-benchmark
pandas-formats-benchmark copied to clipboard
MemoryTracker() - any bug?
Firstly, thank you for the great post comparing various storage formats: https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d.
I was interested in your code, especially to track memory, and trying to use it to reproduce the results, but my MemoryTracker is not working. Not sure if it has to do with Python 2 (I am using Python 2.7) - should not be!
def my_func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
with MemoryTracker() as mt:
my_func()
print mt.memory
This yields an error. On debugging I find that the run
method of MemoryTrackingProcess
class never gets called during execution. It gets called only at the end, which gives the error.
Could you please tell me if I am doing something incorrectly?
Thanks
@ankitagra Hi! Thank you for reaching out, I am glad that you've found the most useful. I'll check this code fragment soon and let you know. Also, could you please share the error message you've encountered?
There is no Python exception that is raised. But mt.memory
in the code above returns a negative memory number, which happens to be the same as -1 * MemoryTracker().start_mem
.
On debugging I find that, in the code section below, the self.p.max_mem.value
is 0
(this is because, the run()
method of MemoryTrackingProcess()
never gets called).
@property
def memory(self):
return self.p.max_mem.value - self.start_mem
So my question is why is the run()
method not being called? I thought that it would get auto-called every few seconds.
I have copied your classes MemoryTracker(object)
and MemoryTrackingProcess(Process)
verbatim, but just running this code in Py 2.7 instead of Py 3.
Please let me know if this is not clear, or if you are not able to reproduce this.
Thanks a lot for looking into this.
Did anyone solve the problem?
have found the problem: reset of the start method from multiprocessing is needed in utils.py. (with python3.10.9, macos)
from multiprocessing import set_start_method set_start_method("fork", force=True)