cldr icon indicating copy to clipboard operation
cldr copied to clipboard

CLDR-14543 make VettingViewer jobs wait if memory is too low

Open srl295 opened this issue 3 years ago • 12 comments

CLDR-14543

  • [ ] This PR completes the ticket.

srl295 avatar Mar 29 '22 19:03 srl295

it's a stopgap.

Probably should timeout and try anyway after a while (hence draft).

Doesn't make much sense without threading.

srl295 avatar Mar 29 '22 19:03 srl295

@srl295 I've been experimenting with your idea locally. So far it seems that sleeping for ten seconds has no effect on garbage collection. With multiple threads running, gc may not be influenced very much by any particular thread sleeping. I'll try System.gc() and Runtime.getRuntime().gc() next, though these too seem like stopgaps...

btangmu avatar Mar 30 '22 18:03 btangmu

@srl295 I've been experimenting with your idea locally. So far it seems that sleeping for ten seconds has no effect on garbage collection. With multiple threads running, gc may not be influenced very much by any particular thread sleeping. I'll try System.gc() and Runtime.getRuntime().gc() next, though these too seem like stopgaps...

I saw you mention this and didn't get to reply. The purpose of my sleep here is not to make gc run. gc is not the issue in my evaluation, the heap is full of reachable objects that won't get gc'ed.

The point of this sleep is to stop new processing until other processing finishes and releases memory.

srl295 avatar Mar 30 '22 18:03 srl295

"the heap is full of reachable objects that won't get gc'ed" -- maybe you're right

"until other processing finishes and releases memory" -- during Summary, it seems like the only significant processing is in threads invoked by Summary itself, in which case all those memory-holding threads are just going to be spending a lot of time sleeping and therefore taking even longer to finish and release their memory?

BTW freeMemory() is based on the current heap size, not the maximum heap size, so the sleep (or whatever corrective action) should probably be triggered by a different calculation; supposedly available = max - (total - free)...

btangmu avatar Mar 30 '22 19:03 btangmu

"the heap is full of reachable objects that won't get gc'ed" -- maybe you're right

"until other processing finishes and releases memory" -- during Summary, it seems like the only significant processing is in threads invoked by Summary itself, in which case all those memory-holding threads are just going to be spending a lot of time sleeping and therefore taking even longer to finish and release their memory?

My context is #1840 so I'm adding threads. So the sleep here is to avoid making the memory problem worse.

BTW freeMemory() is based on the current heap size, not the maximum heap size, so the sleep (or whatever corrective action) should probably be triggered by a different calculation; supposedly available = max - (total - free)...

That is a good point.

srl295 avatar Mar 30 '22 20:03 srl295

@btangmu thinking about where we are on this…  Now that the memory was bumped to 7Gb, are we blocked from snapshots working? Because perhaps the right next steps on this are to keep tackling improving the memory use.. and come back to the threading performance later.

srl295 avatar Mar 30 '22 20:03 srl295

@srl295 i'm seeing mixed results. 7 GB does seem to have helped the situation on cldr-smoke, since i was able to make a snapshot there manually as admin... however, even with 7 GB, running locally it's ok sometimes but at other times it's not, visualvm shows it hit max ram and stay there

i suspect the threading is a bigger problem than inefficient use of memory...

btangmu avatar Mar 30 '22 21:03 btangmu

@srl295 i'm seeing mixed results. 7 GB does seem to have helped the situation on cldr-smoke, since i was able to make a snapshot there manually as admin... however, even with 7 GB, running locally it's ok sometimes but at other times it's not, visualvm shows it hit max ram and stay there

In that situation for me, hitting 'gc' makes only a small dip because there are lots of reachable objects.

i suspect the threading is a bigger problem than inefficient use of memory...

Making a snapshot should be uncontested though, right? VettingViewer's computeAll() is called from a single thread.

srl295 avatar Mar 30 '22 21:03 srl295

@srl295 "Making a snapshot should be uncontested though, right? VettingViewer's computeAll() is called from a single thread." -- i think that's right... so with the current code, the compute() method is dead (never called)

btw this appears to work right to get the "available" memory, which i think would be the appropriate thing to measure if we need a threshold to determine when we're really running low:


    public static synchronized void reportFreeMemory(String callerId) {
        final Runtime r = Runtime.getRuntime();
        long freeMem = r.freeMemory();
        long maxMem = r.maxMemory();
        long totalMem = r.totalMemory();
        long availMem = maxMem - (totalMem - freeMem);
        log(callerId, "Avail. memory: " + availMem +
            "; free: " + freeMem +
            "; max: " + maxMem +
            "; total: " + totalMem);
    }

btangmu avatar Mar 31 '22 16:03 btangmu

@srl295 "Making a snapshot should be uncontested though, right? VettingViewer's computeAll() is called from a single thread." -- i think that's right... so with the current code, the compute() method is dead (never called)

It would be called in the ForkJoinPool context, which is dead code.

btw this appears to work right to get the "available" memory, which i think would be the appropriate thing to measure if we need a threshold to determine when we're really running low:


    public static synchronized void reportFreeMemory(String callerId) {
        final Runtime r = Runtime.getRuntime();
        long freeMem = r.freeMemory();
        long maxMem = r.maxMemory();
        long totalMem = r.totalMemory();
        long availMem = maxMem - (totalMem - freeMem);
        log(callerId, "Avail. memory: " + availMem +
            "; free: " + freeMem +
            "; max: " + maxMem +
            "; total: " + totalMem);
    }

Though simple, this could be good be good to add in cldr-code somewhere for reuse.

srl295 avatar Mar 31 '22 16:03 srl295

I've been experimenting with a new org.unicode.cldr.util.MemoryHelper for the reportFreeMemory method

if the work for Summary is all being done in a single thread running computeAll() then it seems that thread is allocating memory for lots of locales and paths and all that memory is staying reachable until computeAll() finishes

if each locale were in a different thread maybe that would help

alternatively maybe something needs to be done explicitly to make memory for each locale unreachable when done with that locale, to enable gc

btangmu avatar Mar 31 '22 16:03 btangmu

I've been experimenting with a new org.unicode.cldr.util.MemoryHelper for the reportFreeMemory method

+1

if the work for Summary is all being done in a single thread running computeAll() then it seems that thread is allocating memory for lots of locales and paths and all that memory is staying reachable until computeAll() finishes

computeAll() isn't keeping anything reachable. If anything, it's the shared STFactory which is global to the entire SurveyTool instance.

if each locale were in a different thread maybe that would help

I don't think there's any per-thread explicit reachability.

alternatively maybe something needs to be done explicitly to make memory for each locale unreachable when done with that locale, to enable gc

The reachability is what's addressed in my memory issues shared doc. The CLDRFiles are cached by the owning Factory (STFactory in this case).

Maybe what I need to do is make some unit tests on STFactory accessing all locales, and make sure that nothing is reachable that shouldn't be when not in use.

srl295 avatar Mar 31 '22 16:03 srl295