pandas-formats-benchmark icon indicating copy to clipboard operation
pandas-formats-benchmark copied to clipboard

MemoryTracker() - any bug?

Open ankitagra opened this issue 5 years ago • 4 comments

Firstly, thank you for the great post comparing various storage formats: https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d.

I was interested in your code, especially to track memory, and trying to use it to reproduce the results, but my MemoryTracker is not working. Not sure if it has to do with Python 2 (I am using Python 2.7) - should not be!

    def my_func():
        a = [1] * (10 ** 6)
        b = [2] * (2 * 10 ** 7)
        del b
        return a

    with MemoryTracker() as mt:
        my_func()
    print mt.memory

This yields an error. On debugging I find that the run method of MemoryTrackingProcess class never gets called during execution. It gets called only at the end, which gives the error.

Could you please tell me if I am doing something incorrectly?

Thanks

ankitagra avatar May 16 '19 05:05 ankitagra

@ankitagra Hi! Thank you for reaching out, I am glad that you've found the most useful. I'll check this code fragment soon and let you know. Also, could you please share the error message you've encountered?

devforfu avatar May 23 '19 11:05 devforfu

There is no Python exception that is raised. But mt.memory in the code above returns a negative memory number, which happens to be the same as -1 * MemoryTracker().start_mem.

On debugging I find that, in the code section below, the self.p.max_mem.value is 0 (this is because, the run() method of MemoryTrackingProcess() never gets called).

    @property
    def memory(self):
        return self.p.max_mem.value - self.start_mem

So my question is why is the run() method not being called? I thought that it would get auto-called every few seconds.

I have copied your classes MemoryTracker(object) and MemoryTrackingProcess(Process) verbatim, but just running this code in Py 2.7 instead of Py 3.

Please let me know if this is not clear, or if you are not able to reproduce this.

Thanks a lot for looking into this.

ankitagra avatar May 24 '19 09:05 ankitagra

Did anyone solve the problem?

ichliebemath avatar Jan 17 '23 23:01 ichliebemath

have found the problem: reset of the start method from multiprocessing is needed in utils.py. (with python3.10.9, macos)

from multiprocessing import set_start_method set_start_method("fork", force=True)

ichliebemath avatar Jan 18 '23 22:01 ichliebemath