p4c icon indicating copy to clipboard operation
p4c copied to clipboard

Elevated memory usage in def-use

Open asl opened this issue 1 year ago • 6 comments

We are seeing terrible elevated memory usage in def-use.

We start like this:

    PassRepeated invoking DoSimplifyDefUse
      ProcessDefUse invoking P4::ComputeWriteSet
      heap after P4::ComputeWriteSet: in use 758MB, max 837MB
      ProcessDefUse invoking P4::(anonymous namespace)::FindUninitialized
      heap after P4::(anonymous namespace)::FindUninitialized: in use 758MB, max 837MB

and end like this:

heap after SimplifyDefUse: in use 9.2GB, max 9.3GB

Note that in the reality the heap mem size query triggers GC collection. In the reality the peak memory usage could be well beyond 16 Gb.

I do not know yet the culprit, still investigating.

asl avatar Aug 16 '24 21:08 asl

Tagging @ChrisDodd

asl avatar Aug 16 '24 21:08 asl

The usual suspect:

Allocated a total of 2.7GB memory
allocated 737MB in 5576175 calls from:
  4   p4c                       0x0000000102ff5234 _ZNK2P411Definitions15joinDefinitionsEPKS0_ + 440 0x102ff5234
...

asl avatar Aug 16 '24 21:08 asl

Another example P4 program that shows quite large CPU and memory use during SimplifyDefUse pass: https://forum.p4.org/t/error-during-p4-program-compilation-too-many-heap-sections-increase-maxhincr-or-max-heap-sects/1231/7

jafingerhut avatar Dec 26 '24 03:12 jafingerhut

Thanks @jafingerhut, this is great small example. We're having (at first use-def invocation):

      heap after RemoveUnused: in use  45MB, max  55MB
      ProcessDefUse invoking P4::ComputeWriteSet
      heap after P4::ComputeWriteSet: in use 2.1GB, max 2.1GB
      ProcessDefUse invoking P4::(anonymous namespace)::FindUninitialized
      heap after P4::(anonymous namespace)::FindUninitialized: in use 2.1GB, max 2.1GB
      ProcessDefUse invoking RemoveUnused

I killed the process after reaching ~6 Gb of used memory during the second use-def.

The code just contains some decent amount of small actions, plus switches and conditions called according to the input. It looks pretty similar in this case to the large apps I saw, though is a bit different (my causes often have bunch of temporaries created by side effect ordering).

And the memory allocation pattern is standard:

Allocated a total of 1.8GB memory
allocated 1.3GB in 2096398 calls from:
 ??: P4::Definitions::joinDefinitions() [100977f30]
 ??: P4::ProgramPoints::merge() [1009779b0]
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::raw_hash_set() [100984c2c]
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::resize() [100984fbc]
allocated  80MB in 24992 calls from:
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::insert<>() [100977894]
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::find_or_prepare_insert<>() [100986820]
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::prepare_insert() [100986910]
 ??: absl::lts_20240116::container_internal::raw_hash_set<>::resize() [100984fbc]

asl avatar Dec 26 '24 20:12 asl

I ran it on a system with 16 GBytes of RAM, and I saw the memory usage of p4c increase to between 11 and 12 GBytes, and I think it took around 20 minutes to finish, most of that time being spent in the longer of the two SimplifyDefUse passes that I noticed took significant time.

jafingerhut avatar Dec 26 '24 20:12 jafingerhut

I ran it on a system with 16 GBytes of RAM, and I saw the memory usage of p4c increase to between 11 and 12 GBytes, and I think it took around 20 minutes to finish, most of that time being spent in the longer of the two SimplifyDefUse passes that I noticed took significant time.

Second one runs after inliner. So, the input code size increases quite a lot. The memory consumption of SimplifyDefUse is definitely superlinear in number of statements as it tracks all "last" writes that happened to the particular program point...

We have disables second SimplifyDefUse downstream – we simply cannot "afford" it. And even the first one is problematic :)

asl avatar Dec 26 '24 20:12 asl