exist icon indicating copy to clipboard operation
exist copied to clipboard

[BUG] Possible memory leak in loop over array containing maps

Open nverwer opened this issue 1 year ago • 2 comments

Description

When extracting data from a large array that contains maps, heap memory in Java keeps growing. This might be caused by a memory leak. When running the same script with the same data, garbage collection appears to retrieve the used memory. However, each time the code or the data changes, the heap memory usage increases and does not return to the previous level.

The following graph was generated using real data (https://zenodo.org/records/10482057) and comes from jconsole: image

Using generated data (see below), the graph is similar: image

Expected behaviour

The heap memory usage should return to a lower level after garbage collection. It should not increase permanently after a change in code or data.

To reproduce

The following script is a much simplified version of a script that gets data out of a (large) array containing maps. In the original script, the data comes from a JSON file, but I get the same results when generating the data in the script:

let $doc as item() := array
  { for $i in 1 to 500000
    return map
      { 'id' : 'id'||$i
      , 'status' : if ($i mod 100 = 0) then 'inactive' else if ($i mod 80 = 1) then 'withdrawn' else 'active'
      , 'relationships' : array{ map
        { 'type' : 'Related'
        , 'id' : 'id'||($i+1)
        }}
      }
  }
let $doc-size as xs:integer := array:size($doc)

let $ids :=
  for $doc-index in 1 to $doc-size
    let $item as map(*) := $doc($doc-index)
    (:let $status := $item?status:)
    (:let $relationships as array(*)? := $item?relationships:)
    where $item?status = ('withdrawn','inactive') and exists($item?relationships)
  return $item?id

return count($ids)

At first, I thought that the memory leak (if that is what this is) was in the loop variables $status and $relationships, but that seems not to be the case, so I commented them out.

The second graph above was generated by running this script a few times, than change 500000 in for $i in 1 to 500000 into 500001, run a few times, change to 500002, run a few times, etcetera.

Context

eXist-db: eXist-6.2.0 JVM: OpenJDK 64-Bit Server VM version 11.0.14.1+1 OS: WIndows 10 eXist is run with the launcher (not as a service, although that appears to have the same problem), with memory.max=8192.

More details

I used VisualVM to analyze a heap dump, to get an idea of what takes up all the space in the heap. This suggests that there is a lot in the cache. However, cache:clear() does not change the used heap space.

image

image

I am not sure if this gives an indication of what is going on.

nverwer avatar Jul 22 '24 09:07 nverwer