aptos-core icon indicating copy to clipboard operation
aptos-core copied to clipboard

Too many open files error on latest tesnet release [Bug]

Open michaelh-laine opened this issue 1 year ago • 5 comments

🐛 Bug

2022-09-03T21:45:30.500382Z [state-sync-driver] WARN storage/aptosdb/src/lib.rs:1795 AptosDB API returned error. {"api_name":"save_transactions","error":"IO error: While open a file for appending: /opt/aptos/data/db/ledger_db/004474.sst: Too many open files"}
2022-09-03T21:45:30.502804Z [state-sync-driver] ERROR state-sync/state-sync-v2/state-sync-driver/src/storage_synchronizer.rs:831 {"message":"Storage synchronizer error: \"Failed to commit executed chunk! Error: IO error: While open a file for appending: /opt/aptos/data/db/ledger_db/004474.sst: Too many open files\"","name":"storage_synchronizer"}

This is on Ubuntu 20.04 running from source code, on commit 88f936bf40a56768e2dac420e953a0307281c1ba

Never happened before

michaelh-laine avatar Sep 03 '22 21:09 michaelh-laine

@michaelh-laine Thanks for reporting this.

It's not causing by the new release, but a limit on the maximum number of open file descriptor in your system. The default is very low and we are hitting that threshold because our database became larger. You can try to use "ulimit -n" to change it to high number. e.g. ulimit -n 1048576

Note this only change the limit temporarily and doesn't work after reboot. There are many ways to make it permanently, you can search it online. (I personally just put it in my .bashrc)

grao1991 avatar Sep 04 '22 18:09 grao1991

@michaelh-laine Thanks for reporting this.

It's not causing by the new release, but a limit on the maximum number of open file descriptor in your system. The default is very low and we are hitting that threshold because our database became larger. You can try to use "ulimit -n" to change it to high number. e.g. ulimit -n 1048576

Note this only change the limit temporarily and doesn't work after reboot. There are many ways to make it permanently, you can search it online. (I personally just put it in my .bashrc)

Yeah I know what it means and I've permanently tuned it now to 1m (from the default 65k or so), but it's worth raising as the aptos node intall itself or the docs should have tuning steps with recommended values as this will likely affect many nodes otherwise

michaelh-laine avatar Sep 04 '22 18:09 michaelh-laine

@sherry-x do we mention ulimit in any of our docs?

gregnazario avatar Sep 07 '22 20:09 gregnazario

@grao1991 what's a reasonable number, we should have some documentation on system setting probably

sherry-x avatar Sep 07 '22 20:09 sherry-x

@grao1991 what's a reasonable number, we should have some documentation on system setting probably

I don't know. I do unlimited on my machine, and I don't really understand the downside of that. @msmouse any comment?

grao1991 avatar Sep 07 '22 21:09 grao1991

I have been running on testnet and devnet over the past week on a linux distro and cannot repro. If it persists, let us know. We could probably create an FAQ if it does.

davidiw avatar Sep 25 '22 19:09 davidiw