rcv icon indicating copy to clipboard operation
rcv copied to clipboard

better default memory allocation and communication to user

Open tarheel opened this issue 4 years ago • 13 comments

This is a companion to #552. @moldover's suggestions:

  • detect user system memory size
  • tell the user what size of election they can expect to tabulate based on their system
  • always use 80% (or whatever seems appropriate) when tabulating
  • warn the user during cvr parsing if they approach / exceed the limits for their system

Note: not 100% sure, but I don't think that last idea is possible. The JVM's preference is to make use of the memory you allocate to it, since garbage collection has a non-zero cost. So it will end up consuming most or all of the available heap space during the normal course of operation even when it's actually well within the limits we've defined.

tarheel avatar Apr 12 '21 17:04 tarheel

Didn't 1.3 cover some of this? I could be wrong but I thought 1.3 did some memory optimization.

chughes297 avatar Feb 16 '23 18:02 chughes297

No, v1.3 did not include anything related to memory use.

tarheel avatar Feb 22 '23 20:02 tarheel

I'd recommend adding #552 to v1.4 also.

tarheel avatar Feb 22 '23 20:02 tarheel

Given the major reduction in memory footprint, is this still a P1 task? I expect users won't be hitting memory ceilings anymore, though I guess I'm not sure what hardware elections are being run on.

My gut is that we've reduced memory usage enough that this isn't as important anymore -- though I defer to those of you with more experience.

artoonie avatar Jun 08 '23 21:06 artoonie

I've learned to assume a pretty small amount of RAM on computers users might have for RCTab. Assume we have a computer with 4GB RAM - do you have a sense of the vote ceiling there?

chughes297 avatar Jun 09 '23 18:06 chughes297

Just ran a basic test -- tabulating 300,000 CVRs out of 1,500,000 records takes 500mb additional memory (in addition to 350mb used just to launch RCTab). That means we'd be able to support 2.4 million CVRs on a 4GB machine - plus/minus a bit, since the rest of the machine will use some memory, but the JVM would also garbage collect more often.

However, I have noticed a memory leak -- each time you run a tabulation, more and more memory is used. That could be the cause: if people re-run elections multiple times, each subsequent election has less memory to work with, and it'll eventually hit a ceiling.

I think if we solve this leak, we'd solve the overall problem. I'm going to look into that.

artoonie avatar Jun 12 '23 18:06 artoonie

two questions:

  1. could we expect this to scale linearly? like if i had a computer with 40GB RAM could I reasonably expect to be able to tabulate ~24 million records?
  2. is there any difference in capacity across CVR formats? are some more resource intensive than others?

chughes297 avatar Jun 12 '23 18:06 chughes297

  1. Yes, I expect so, but I'm testing to validate that.
  2. I haven't thoroughly tested all formats, but from the handful I've tested, reading CVRs isn't the memory-intensive part: it's doing the tabulation (which is format-agnostic). So, best I can tell, the bottleneck will be format-agnostic.

I'm still working to get a better understanding here.

artoonie avatar Jun 12 '23 18:06 artoonie

  1. Cool!
  2. Ok, got it. I guess that was the point of #640, to resolve the reading-in bottleneck that we ran into in the past.

chughes297 avatar Jun 12 '23 18:06 chughes297

Alright, I've spent a lot of cycles trying to hunt down the memory leak, and I think I'm chasing a phantom: something somewhere is being cached, and I'm not familiar enough with Java debugging tools to pinpoint what's happening -- but if I wait 15 minutes, the memory decreases, so I think Java is just using the memory available on my machine, and it wouldn't happen in the "real world".

If somebody with a Virtual Machine can test this, that would be helpful ( @HEdingfield ?) -- see what happens on a machine with 4gb ram. My gut tells me that we should be fine now, and that we realistically won't be hitting limits anymore, but I'm not 100% confident in that.

artoonie avatar Jun 12 '23 20:06 artoonie

No easy way to test on my end either. I suggest we maybe open up another issue outside of this release to check in on it again in the future (linking back to this one), and maybe close a bit later if nobody else has complained about it.

HEdingfield avatar Jun 12 '23 21:06 HEdingfield

Almost a year later -- thinking we can close this @yezr ?

artoonie avatar Mar 30 '24 17:03 artoonie

Looking through all github issues related to memory footprint. I see #640 PR fixing #552 that makes all tabulation use less memory by improving the memory footprint of CVRs. That PR will raise the ballot # ceiling that we can successfully tabulate given the same memory size.

I created #824 to revise the ballot to memory estimates we currently have in Section 3 of the TDP. With new ballot to memory estimates we can decide whether this issue is still necessary. Like if a machine with 4GB memory can now reliably do millions of ballots we can drop the priority of this one.

yezr avatar Apr 12 '24 13:04 yezr