ACES icon indicating copy to clipboard operation
ACES copied to clipboard

Memory usage with high # of predicates

Open justin13601 opened this issue 1 year ago • 5 comments

Possible area of enhancement regarding memory usage when a high number of predicates is specified.

I remember when I profiled, the memory peaked during the creation of the predicate columns.

@Jwoo5 could you please help confirm if this is the case for your dataset/tasks as well? I just used mprof to run a script with the extraction code and looked at the memory plots.

Tagging @mmcdermott

justin13601 avatar Aug 01 '24 03:08 justin13601

Yes, the memory peaked when creating the predicate columns. It would be great if we can expand the predicates grammar to support "any of codes" operation without creating intermediate predicate columns to realize the final expression (e.g., or(...))

Jwoo5 avatar Aug 01 '24 05:08 Jwoo5

FYI, when I used only ~70 predicates to define the task and process the same data, it took about 14 minutes with ~15GB of RAM. This number is much less than that when I used ~1400 predicates, which took about 2 hours with ~150GB of RAM, so I believe # of predicates is the main problem.

Jwoo5 avatar Aug 01 '24 08:08 Jwoo5

Ok, this is great information. We can pretty easily support this.

mmcdermott avatar Aug 01 '24 15:08 mmcdermott

So I think that #90 will likely solve this, so we will relegate active analysis of this to there. In the end, we may also need to invest in specific, easy to use profiling scripts as well to better understand the computational performance, but hopefully #90 will eliminate this issue and we'll be good to go.

mmcdermott avatar Aug 06 '24 20:08 mmcdermott

Can we check if #90 actually solved this? @justin13601 or @Jwoo5

mmcdermott avatar Aug 22 '24 16:08 mmcdermott