PyAthena icon indicating copy to clipboard operation
PyAthena copied to clipboard

PyAthena - Memory Issue

Open innicoder opened this issue 2 years ago • 7 comments

If you remember the last request where we continually execute queries, like 1000 per hour it seems that the memory is continually growing and can't stop it.

This happens with a PandasCursor I thought the solution is to use chunksize but that wasn't the issue. The problem is that the memory still grows by 0.1 and since we have a deamon thread that runs 24/7 it eventually grows beyond memory size and doesn't reclaim itself.

I tried to execute manually gc.collect() and delete the object dataframe but something seems to be going on internally in your library after 16 hours of investigation that seems to be the problem and I'm out of reach for now.

I'm looking for an idea on how to resolve this issue. Thanks.

innicoder avatar Mar 14 '23 10:03 innicoder

@laughingman7743 Let me know if you have a idea.

Here's some of the things we talked about https://github.com/laughingman7743/PyAthena/issues/416 in the last issue.

To recreate the issue just use any query and repeat it in a docker container, you will see it grow, by 0.1 MB each time and it doesn't reclaim that memory space.

innicoder avatar Mar 14 '23 10:03 innicoder

Sorry, I don't know, but there must be a memory leak somewhere.

laughingman7743 avatar Mar 14 '23 10:03 laughingman7743

https://github.com/pandas-dev/pandas/issues/51667 👀

laughingman7743 avatar Mar 14 '23 10:03 laughingman7743

Yeah, no problem. That's what I found as well.

I would be happy to resolve it but I can't get a hang of the library internals.

On Tue, 14 Mar 2023 at 11:36, laughingman7743 @.***> wrote:

Sorry, I don't know, but there must be a memory leak somewhere.

— Reply to this email directly, view it on GitHub https://github.com/laughingman7743/PyAthena/issues/417#issuecomment-1467838257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX3T4JPZ3ZQHFRH2LVYLD3TW4BC2PANCNFSM6AAAAAAV2HBYMU . You are receiving this because you authored the thread.Message ID: @.***>

--

innicoder avatar Mar 14 '23 10:03 innicoder

When using the unload option, the read_csv method is not used. I am wondering if the same memory leakage occurs in that case.

laughingman7743 avatar Mar 14 '23 10:03 laughingman7743

I'm actually using the unload currently.

I tried it before without unload but no success.

On Tue, 14 Mar 2023 at 11:45, laughingman7743 @.***> wrote:

When using the unload option, the read_csv method is not used. I am wondering if the same memory leakage occurs in that case.

— Reply to this email directly, view it on GitHub https://github.com/laughingman7743/PyAthena/issues/417#issuecomment-1467853758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX3T4JL4I5O55VLCVNO4J2DW4BD6BANCNFSM6AAAAAAV2HBYMU . You are receiving this because you authored the thread.Message ID: @.***>

--

innicoder avatar Mar 14 '23 10:03 innicoder

Hi, I think I'm experiencing the same or similar issue, creating new PandasCursors and coming into a memory leak. I've been using objgraph to diagnose it, I think there's something to do with this loop and the S3FIleSystem. The pandas cursor creates an AthenaPandasResultSet which creates an S3FileSystem, then something to do with the AbstractFileSystem in fsspec? I'm not an expert in this sort of thing, but hopefully it helps.

image

Duncan-Hunter avatar Jun 16 '23 15:06 Duncan-Hunter