irods
irods copied to clipboard
iput using too much memory?
Bug Report
iRODS Version, OS and Version
irods 4.2.8-1 CentOS Linux release 7.9.2009 (Core)
What did you try to do?
upload lots of (small 7kB) files (e.g. 200,000) into irods resource
Expected behavior
The memory in the iCAT server, consumed by the irodsServer process handling the iput command should not increase with an increasing number of files being put into iRODS, or at least not at a faster rate than the data being put into the irods resource..
Observed behavior (including steps to reproduce, if applicable)
uploading lots of files into an irods resource consumes lots of memory (irodsServer process). Depending on the memory available in the iCAT server and the number of files being uploaded, the process crashes (Out of memory). The memory consumed is related (linear relation) to the number of files. The size of the files uploaded hardly seems to have any effect. Performed tests: ("iput -b -r dataset_5000x5kB")
- upload 5000 files (5kb each): memory consumed by irodsServer process: 1.2GB (for 25MB of data uploaded)
- upload 10000 files (5kB each): memory consumed by irodsServer process: 2.3GB (for 50MB of data uploaded)
- upload 20000 files (5kB each): memory consumed by irodsServer process: 4.7GB (for 100MB of data uploaded)
- upload 30000 files (5kB each): memory consumed by irodsServer process: 7GB (for 150MB of data uploaded)
- upload 40000 files (5kB each): memory consumed by irodsServer process: 9.3GB (for 200MB of data uploaded)
- upload 5000 files (1kB each): memory consumed by irodsServer process: 1.3GB (for 5MB of data uploaded)
- upload 5000 files (10kB each): memory consumed by irodsServer process: 1.4GB (for 50MB of data uploaded)
- upload 5000 files (100kB each): memory consumed by irodsServer process: 1.4GB (for 500MB of data uploaded)
- upload 5000 files (1MB each): memory consumed by irodsServer process: 1.5GB (for 5GB of data uploaded)
Command used to monitor memory usage: "watch ps -u irods --no-headers -o pid,cmd,etime,vsize,%mem"
There seems to be a more or less fixed overhead per file uploaded. This makes storing large datasets (= with lots of files) in iRODS unfeasible. We observed a similar behaviour with the ibun and irsync command:
- iput -Dtar dataset_5000x5kB.tar ; ibun -x dataset_5000x5kB.tar unpack_dataset_5000x5kB
- irsync -r dataset_5000x5kB i:rsync_dataset_5000x5kB
Confirmed/reproduced. Thanks @BartVanneste!
$ mkdir -p lots
$ for X in {0001..5000}; do echo ${X} > lots/file${X}; done
$ iput -r lots
Note: The bulk flag (-b
) had no effect on my testing.
However... the leak doesn't seem as large as reported above...
iRODS 4.2.8 on containerized Ubuntu16...
an iput
began with 200MB as a baseline:
13015 /usr/sbin/irodsServer 00:04 197584 0.0
5000 files of size 5 at 99% complete:
844 /usr/sbin/irodsServer 01:58 228424 0.0
shows 228424KiB, equivalent to 233MB of virtual memory.
5000 files of size 10k at 99% complete:
12746 /usr/sbin/irodsServer 02:03 228692 0.0
shows nearly identical usage.
iRODS 4.2.8 on containerized CentOS7 (7.7.1908)...
an iput
began with a moderately higher 230MB as a baseline:
5514 /usr/sbin/irodsServer 00:06 232492 0.0
5000 files of size 10k at 99% complete:
5514 /usr/sbin/irodsServer 01:37 261540 0.0
This morning I've done some new tests on a new setup with different storage back-end (NetApp - NFS) and now the memory usage is more in line with what you have. The previous tests were done on a Beegfs storage back-end. We'll try to perform some more test to see if it's really the difference in storage back-end that makes for this difference in memory usage of the irodsServer process.
very interesting. that helps narrow it down on our side as well.
I suspect the additional size reflects the usage of the BGFS client in memory.
My apologies for the slow response: The BGFS filesystem was also exported via NFS. So the BGFS client wasn't used on the iCAT server. I'm also trying to figure out if the memory usage can be related to the number of irods resources present, as that was also a difference between my initial tests and the last tests. The first environment had 5 resources (from which 2 were compound ones). I did my initial tests against only the BGFS (via NFS) resource.
It might take some time as I'm not the one configuring the irods setup and also some holidays are coming up.
Regards and Best wishes, Bart Vanneste
I also suspected the number of resources present in memory may have an effect.
I added 200 unixfilesystem resources during my tests above, and the leak speed did not change.
OK. Good to know. I'll focus on finding out why we had this huge memory usage in our first tests.
Hello everyone. Are there any news to this issue? I'm seeing the described behavior on our system (4.2.7) as well. I tried to irsync -rK a directory tree with ~600K files and run into out of memory problems. The Server has 64G memory and the irsync got ~450K files up before the process crashed. The memory usage stops immediately when the when the irsync finishes (for less files), so my solution right now is to use a bash script to check the number of files and then upload subdirectories. Is there any other way to work around this?
Thanks and best regards Johannes
Not at this time.
I just re-ran the above experiment with 4.2.11 and saw similar results as before.
We clearly have a relatively small leak for each file, but when performed 600k times... it's definitely a problem.