aptly icon indicating copy to clipboard operation
aptly copied to clipboard

too many open file handles

Open avongluck-r1soft opened this issue 9 years ago • 20 comments

When adding a large number of deb's in a row, aptly will begin throwing "too many open files" while API mode is running.

(aptly running in API mode with no-lock)

find /tmp/iso/pool -name "_._deb" -exec aptly repo add $2 {} ;

[+] php5-cli_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-common_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-curl_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gd_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gmp_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-ldap_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/23/57/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/50/0c/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/cb/8e/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/5b/4d/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ba/ec/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/76/eb/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/19/05/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/8c/cc/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/f8/55/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ec/c7/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/c5/74/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb into pool: open /repo/.aptly/pool/eb/2a/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package php5-json_1.3.2-2build1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php-json/php5-json_1.3.2-2build1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package python-pil_2.3.0-1ubuntu3_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pillow/python-pil_2.3.0-1ubuntu3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpixman-1-0_0.30.2-2ubuntu1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pixman/libpixman-1-0_0.30.2-2ubuntu1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpkcs11-helper1_1.11-1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pkcs11-helper/libpkcs11-helper1_1.11-1_amd64.deb ERROR: some files failed to be added Loading packages...

avongluck-r1soft avatar Feb 16 '16 17:02 avongluck-r1soft

This can be worked around via "ulimit -n 64000" however it seems like aptly is keeping a lot of file handles open...

avongluck-r1soft avatar Feb 16 '16 17:02 avongluck-r1soft

@avongluck-r1soft I was trying to figure out the problem here... so you are adding packages via CLI aptly while API aptly is running? Is it related to API aptly?

smira avatar Mar 01 '16 08:03 smira

Also this is inefficient way to add packages (very slow), aptly can do much better job by importing more packages at once, it can do iteration on its own like aptly repo add xyz /tmp/iso/debs or you can use xargs to import more packages in each call.

smira avatar Mar 01 '16 09:03 smira

I had a similar issue yesterday - had been uploading debs via REST. The service had been up for a long time (weeks if not months) and there's usually about half-a-dozen uploads per day from our build system. The REST API was accepting the debs (reporting that it uploaded them), but after reporting it published the resulting snapshot, the debs weren't actually available for the clients. In the log, I was getting GPG errors due to open files.

Mar 23 08:58:12 vimes bash[334]: Error #01: unable to initialize GPG signer: unable to execute gpg: fork/exec /usr/bin/gpg: too many open files (is gpg installed?):
Mar 23 08:58:12 vimes bash[334]: Meta: Operation aborted

Our setup only ever uploads one package at a time, and a restart fixed the issue. My guess is that the problem is not the bulk upload itself, but the number of open filehandles - and in our case, it had just grown over time.

This was all seen with default setting for nofiles (1024 soft 4096 hard) and aptly 0.9.5

vacri avatar Mar 23 '16 22:03 vacri

So aptly should have fd leak somewhere. Thanks I'll keep looking.

smira avatar Mar 24 '16 17:03 smira

@smira, I was actually just using xargs with aptly snapshot list -raw to clean up a bunch that were no longer in use and hit an issue between this and #818, where after doing the mass drop aptly would report that no mirrors, snapshots, or publish's existed. Using aptly db recover didn't resolve the issue, and running aptly db cleanup -dry-run -verbose would give me the too many open files error.

This is aptly 1.3.0 running on 16.04.4. Thankfully my repos are published to S3 and I had a snapshot of the instance volume from this morning to recover the instance itself. But it would be great to find out why this is happening before I attempt it again.

emopinata avatar Jun 06 '19 19:06 emopinata

@jalmansor just trying to understand what is going on there. as aptly via xargs is a process each time it runs, it should be something using too many fds for a single invocation and that something is causing goleveldb corruption.

smira avatar Sep 10 '19 13:09 smira

We are having a similar issue. Aptly has been in use for 2 years. We recently started testing using the API (with -no-lock) and are encountering this issue doing snapshot/db cleanups. aptly 1.3.0 (from 'http://repo.aptly.info squeeze main') Ubuntu 14.04 Trusty File system is a Glusterfs mount (gluster v3.10.12). $ ulimit -n 1024 $ ls -l repo/db/*.ldb | wc -l 1695

errors include "too many open file handles" and "ERROR: open /srv/shared/aptly/repo/db/1076864.ldb: stale NFS file handle"

Aptly repos are managed by bash scripts run by Jenkins.

At the command line (after stopping aptly server): for S in $(aptly snapshot list -raw); do aptly snapshot drop ${S};done "too many open files" fixed with ulimit -n 2048 Still see stale NFS messages but do not seem to be fatal.

Our API version is just the same commands with the API equivalent of CLI (curl ${aptly_host}/api/snapshots) run repeatedly in a loop until the number of snapshots is the same, before and after.

I have added ulimit -S -n 2048 to our /etc/init.d/aptly-server script and initial test is positive, will be increasing to 4096 for production.

NOTE: Running aptly db recover while still in the "too many files" state appeared to trash the db. It was restored from a copy made immediately prior.

biggreenogre avatar Sep 17 '19 17:09 biggreenogre

on a side note, I don't think having LevelDB on NFS is really good scenario (not sure about locking and consistency).

I'm thinking about what is triggering this open file exhaustion though. Looks to be related to number of .ldb files, but not clear what's up as it should be handled correctly in the LevelDB library.

smira avatar Sep 17 '19 18:09 smira

I tried to reproduce this by creating and dropping 2000 snapshots, but number of files in the db directory stays stable like it should (testing with master version though)

smira avatar Sep 17 '19 18:09 smira

Created 10GB worth of database, 3.5k files, ulimit set to 384, still no failures

smira avatar Sep 17 '19 19:09 smira

22:36:08.970318 version@stat F·[7 56 395 3025] S·10GiB[1KiB 98MiB 999MiB 9GiB] Sc·[1.75 0.98 1.00 0.95]

smira avatar Sep 17 '19 19:09 smira

Looks like I was able to hit that, but with forced compaction and aptly db cleanup

smira avatar Sep 17 '19 20:09 smira

but seems to be aptly db recover triggering all the tables to be on L0 which leads to this behavior.

smira avatar Sep 17 '19 20:09 smira

Summary:

  • too many open files seem to be related to huge number of tables on L0 (level zero)
  • under regular operations, I can't trick DB to enter this state
  • aptly db recover even on healthy DB forces all the tables to L0 which leads to the error above

Plan:

  • add more safety belts to the aptly db recover, don't let users run unless they understand the consequences and have a backup
  • implement more granular compaction options in aptly db cleanup and aptly db recover

smira avatar Sep 18 '19 16:09 smira

Looks like the db got corrupted yesterday, jobs failed overnight based on this error:

`aptly@fulcrum-1:~$ aptly publish list

ERROR: unable to load list of repos: snapshot with uuid 66ed3ea5-2931-4e91-8c8d-8a0f6255de0c not found`

Listing snapshots, repos, etc worked fine. Restored db from earlier backup and so far, so good. Have not seen "too many files" error since adding ulimit -S -n 2048 to api server init script and before running CLI commands.

biggreenogre avatar Sep 18 '19 17:09 biggreenogre

Note: Re:- "stale NFS file handle" messages The db is NOT on an NFS file system but on a GLUSTERFS file system with NFS specifically disabled. I don't think this is related to the issue at hand, rather a minor glusterfs problem.

biggreenogre avatar Sep 18 '19 17:09 biggreenogre

#882 might help here as well

smira avatar Sep 27 '19 11:09 smira

Hi @biggreenogre , did you retry with recent code ?

Regards

flotho avatar Apr 05 '22 18:04 flotho

I have not. I'm afraid I don;t have any experience with GO or how to build it. I also need the fix for the S3/MD5SUM issue. Is there a quick "howto" on building aptly? e.g. how to install the GO build environment

biggreenogre avatar Apr 07 '22 16:04 biggreenogre