zed icon indicating copy to clipboard operation
zed copied to clipboard

VirtualAlloc of bytes failed with errno=1455

Open SoftTools59654 opened this issue 2 years ago • 3 comments

During the import of several Parquet files, the following error is issued in some cases

Approximate volume of parquets is 10 Gb consisting of 20 or 25 files

VirtualAlloc of bytes failed with errno=1455

System RAM 16 GB

SoftTools59654 avatar Mar 04 '24 17:03 SoftTools59654

tl;dr

As confirmed via web searches, the error message the user reported is common in Go when too much memory has been consumed, which means it's happening at the Zed layer and hence this issue has been moved to that repo. In my own repro attempts, it seems like Zed may be leaking memory when reading/writing Parquet files.

Details

Repro is with Zed commit 38763f8.

The user initially reported this issue as Zui-related so I started by dragging several large Parquet files into the app all at once like the user described. Since Zui first pipes files through zq to apply shaping I observed the memory consumed by both the zq and zed processes and saw zq actually consumed almost 2x than that of zed. So because of that and also how both tools ultimately depend on the same Parquet library, I kept things simple by focusing my further testing on purely zq-based repros.

While we don't have the benefit of the user's specific data to reproduce the issue, I did some repros with the wrccdc data we often use when we need larger files. Specifically I started from the ZNG file we store at s3://brim-sampledata/wrccdc/zng/wrccdc.zng and an AWS EC2 m6idn.4xlarge instance since that has 64 GB of memory and hopefully would not run out. I then ran through four permutations with the first serving as a non-Parquet baseline.

$ zq -version
Version: v1.14.0-16-g38763f82

$ zq -i zng -f zng '_path=="conn" | drop duration' wrccdc.zng > conn.zng
$ zq -i parquet -f zng conn.parquet > conn-from-parquet.zng
$ zq -i zng -f parquet '_path=="conn" | drop duration' wrccdc.zng > conn.parquet
$ zq -i parquet -f parquet conn.parquet > conn-from-parquet.parquet

During each run I logged the output of top -b | grep --line-buffered zq so I could then chart the zq's reported RES memory usage throughout each run (so, based on the top defaults, each data point represents a 3-second sample). The result:

image

To summarize:

  1. The all-ZNG baseline is so well-behaved the line isn't even visible on the chart (it never exceeds 100 KB).
  2. The red line for reading Parquet & writing to ZNG approximates what the user was doing. The conn.parquet in our case came out to a size of 25 GB, so 2.5x bigger than how much Parquet data the user claimed to be loading. However, the fact it ended up consuming 50+ GB of memory on our instance seems to validate the user having run out of memory on their 16 GB system.
  3. While the user was not attempting to write Parquet output, I figured I might as well cover all permutations once I was testing. For the green line showing Parquet read & write, the run seem to get close to competing (based on observed output file size) but the instance actually hung and had to be force rebooted due to having run out of memory.

Note that these tests do not try to approximate the user's reported attempt to load many Parquet files at once. But surely if we struggle with only a single large file it's likely to only get worse as we add parallel load, so it seems we might as well study this simpler case first.

Zui Repro

For completeness, here's the repro of the user-describing symptom in GA Zui v1.6.0 by dragging the conn.parquet into the app. This is performed an AWS Windows m6idn.xlarge that, like the user's system, has 16 GB of memory.

https://github.com/brimdata/zed/assets/5934157/73321b04-34d2-4793-ac5c-6e61853531e3

Notes to self

In case I ever need to refer back to them, here's the raw saved top outputs:

And I preprocessed them for my Google Sheet by piping them through:

awk '{ print $6 }' | sed 's/m/\*1024\*1024/' | sed 's/g/\*1024\*1024\*1024/' | sed 's/^/=/'

The conn.parquet is saved to s3://brim-sampledata/wrccdc/parquet-zed-5068/conn.parquet.

philrz avatar Mar 13 '24 02:03 philrz

We discussed this one as a group and @nwt mentioned there's some knobs in the Parquet library that we might be able to adjust to help here.

philrz avatar Mar 14 '24 17:03 philrz

I checked on this one as of current super commit cc68d89 because I know we've updated some Parquet libraries and enhanced our Parquet support a bit in the past year. Short of publishing an exhaustive set of memory stats from a complete re-run, I can testify that the high memory usage shown above is still happening.

philrz avatar Jul 07 '25 20:07 philrz