optimism icon indicating copy to clipboard operation
optimism copied to clipboard

geth crashing with OutOfMemory if request debug_traceTransaction

Open McSim85 opened this issue 3 years ago • 2 comments

Describe the bug If you request debug_traceTransaction for some tx hashed, Optimism client continuously consuming memory until it will be killed by oom-killer. That happens all the time for (at least) these hashes:

"0xcf6e46a1f41e1678fba10590f9d092690c5e8fd2e85a3614715fb21caa74655d"
"0xacb46bc13597e0125a31af5f4162d5e2929a1049700e6f75f2141a522c08567f"
"0xe3463602219d0ed638f4b8db3f7126e1ee747435501812abf0cd38cf7b526969"
"0x6eeb08f0b8b1ba240430d213800770244257f4844a20addf0590b65e177bd589"
"0x6d6fc578e10d88295accecc0f4eb44a8013cb8d11850426672f821c6892d6798"
"0xff8cfb7966450ec5cdc3d71bac104265a332cf6ccd35c849ada0ddaf1e8683b8"
"0x2d579e19657855a85fee279bff59a313a34b25b4b26848b20216d7d8f5e0c9d1"
"0x815d44bc86298c1bdd936132d99f08f3b891e5efcab8e28a0be8e2440ccd6434"

To Reproduce

Just run this request:

curl http://localhost:8545 -d '{"method":"debug_traceTransaction","params":["0xacb46bc13597e0125a31af5f4162d5e2929a1049700e6f75f2141a522c08567f"],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json"
curl: (56) Recv failure: Connection reset by peer

After the crash, you can find these messages in the dmesg

[Thu Feb 24 14:04:34 2022] geth invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[Thu Feb 24 14:04:34 2022] CPU: 12 PID: 319878 Comm: geth Not tainted 5.4.0-97-generic #110-Ubuntu
[Thu Feb 24 14:04:34 2022] Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017
[Thu Feb 24 14:04:34 2022] Call Trace:
[Thu Feb 24 14:04:34 2022]  dump_stack+0x6d/0x8b
[Thu Feb 24 14:04:34 2022]  dump_header+0x4f/0x1eb
[Thu Feb 24 14:04:34 2022]  oom_kill_process.cold+0xb/0x10
[Thu Feb 24 14:04:34 2022]  out_of_memory+0x1cf/0x4d0
[Thu Feb 24 14:04:34 2022]  __alloc_pages_slowpath+0xd5e/0xe50
[Thu Feb 24 14:04:34 2022]  __alloc_pages_nodemask+0x2d0/0x320
[Thu Feb 24 14:04:34 2022]  alloc_pages_current+0x87/0xe0
[Thu Feb 24 14:04:34 2022]  __page_cache_alloc+0x72/0x90
[Thu Feb 24 14:04:34 2022]  pagecache_get_page+0xbf/0x300
[Thu Feb 24 14:04:34 2022]  filemap_fault+0x6b2/0xa50
[Thu Feb 24 14:04:34 2022]  ? native_flush_tlb_others+0x105/0x170
[Thu Feb 24 14:04:34 2022]  ? xas_load+0xd/0x80
[Thu Feb 24 14:04:34 2022]  ? xas_find+0x17f/0x1c0
[Thu Feb 24 14:04:34 2022]  ? filemap_map_pages+0x24c/0x380
[Thu Feb 24 14:04:34 2022]  ext4_filemap_fault+0x32/0x50
[Thu Feb 24 14:04:34 2022]  __do_fault+0x3c/0x130
[Thu Feb 24 14:04:34 2022]  do_fault+0x24b/0x640
[Thu Feb 24 14:04:34 2022]  __handle_mm_fault+0x4c5/0x7a0
[Thu Feb 24 14:04:34 2022]  handle_mm_fault+0xca/0x200
[Thu Feb 24 14:04:34 2022]  do_user_addr_fault+0x1f9/0x450
[Thu Feb 24 14:04:34 2022]  __do_page_fault+0x58/0x90
[Thu Feb 24 14:04:34 2022]  ? exit_to_usermode_loop+0xbf/0x160
[Thu Feb 24 14:04:34 2022]  do_page_fault+0x2c/0xe0
[Thu Feb 24 14:04:34 2022]  do_async_page_fault+0x39/0x70
[Thu Feb 24 14:04:34 2022]  async_page_fault+0x34/0x40
[Thu Feb 24 14:04:34 2022] RIP: 0033:0x4536f8
[Thu Feb 24 14:04:34 2022] Code: Bad RIP value.
[Thu Feb 24 14:04:34 2022] RSP: 002b:000000c0066e71c8 EFLAGS: 00010206
[Thu Feb 24 14:04:34 2022] RAX: 00000000016d2d88 RBX: 000000000195d600 RCX: 000000000195d600
[Thu Feb 24 14:04:34 2022] RDX: 00000000002a29a0 RSI: 000000000166e060 RDI: 0000000000005d40
[Thu Feb 24 14:04:34 2022] RBP: 000000c0066e71d0 R08: 000000000166e060 R09: 000000000000013e
[Thu Feb 24 14:04:34 2022] R10: 000000000043c480 R11: 00007f80eed76fff R12: 000000c0066e7260
[Thu Feb 24 14:04:34 2022] R13: 000000c85c66c280 R14: 000000c001c74680 R15: 00007f80944b9762
[Thu Feb 24 14:04:34 2022] Mem-Info:
[Thu Feb 24 14:04:34 2022] active_anon:16111429 inactive_anon:47 isolated_anon:0
                            active_file:370 inactive_file:44 isolated_file:0
                            unevictable:4616 dirty:0 writeback:0 unstable:0
                            slab_reclaimable:125951 slab_unreclaimable:86997
                            mapped:2347 shmem:290 pagetables:33515 bounce:0
                            free:81575 free_pcp:0 free_cma:0
[Thu Feb 24 14:04:34 2022] Node 0 active_anon:64445716kB inactive_anon:188kB active_file:1480kB inactive_file:176kB unevictable:18464kB isolated(anon):0kB isolated(file):0kB mapped:9388kB dirty:0kB writeback:0kB shmem:1160kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1460224kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Thu Feb 24 14:04:34 2022] Node 0 DMA free:15908kB min:16kB low:28kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Thu Feb 24 14:04:34 2022] lowmem_reserve[]: 0 3477 64265 64265 64265
[Thu Feb 24 14:04:34 2022] Node 0 DMA32 free:246796kB min:3656kB low:7216kB high:10776kB active_anon:3322876kB inactive_anon:0kB active_file:12kB inactive_file:112kB unevictable:0kB writepending:0kB present:3653596kB managed:3588060kB mlocked:0kB kernel_stack:64kB pagetables:6620kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Thu Feb 24 14:04:34 2022] lowmem_reserve[]: 0 0 60787 60787 60787
[Thu Feb 24 14:04:34 2022] Node 0 Normal free:63596kB min:63908kB low:126152kB high:188396kB active_anon:61122376kB inactive_anon:188kB active_file:1108kB inactive_file:0kB unevictable:18464kB writepending:0kB present:63438848kB managed:62254300kB mlocked:18464kB kernel_stack:9600kB pagetables:127440kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Thu Feb 24 14:04:34 2022] lowmem_reserve[]: 0 0 0 0 0
[Thu Feb 24 14:04:34 2022] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[Thu Feb 24 14:04:34 2022] Node 0 DMA32: 464*4kB (UME) 298*8kB (UME) 418*16kB (UME) 387*32kB (UME) 287*64kB (UME) 187*128kB (UME) 89*256kB (UME) 36*512kB (UME) 9*1024kB (UE) 2*2048kB (ME) 31*4096kB (ME) = 247120kB
[Thu Feb 24 14:04:34 2022] Node 0 Normal: 708*4kB (UME) 267*8kB (UME) 376*16kB (ME) 1825*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69384kB
[Thu Feb 24 14:04:34 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Feb 24 14:04:34 2022] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Feb 24 14:04:34 2022] 3079 total pagecache pages
[Thu Feb 24 14:04:34 2022] 0 pages in swap cache
[Thu Feb 24 14:04:34 2022] Swap cache stats: add 0, delete 0, find 0/0
[Thu Feb 24 14:04:34 2022] Free swap  = 0kB
[Thu Feb 24 14:04:34 2022] Total swap = 0kB
[Thu Feb 24 14:04:34 2022] 16777109 pages RAM
[Thu Feb 24 14:04:34 2022] 0 pages HighMem/MovableOnly
[Thu Feb 24 14:04:34 2022] 312542 pages reserved
[Thu Feb 24 14:04:34 2022] 0 pages cma reserved
[Thu Feb 24 14:04:34 2022] 0 pages hwpoisoned
[Thu Feb 24 14:04:34 2022] Tasks state (memory values in pages):
[Thu Feb 24 14:04:34 2022] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Thu Feb 24 14:04:34 2022] [    623]     0   623    29706     1054   253952        0          -250 systemd-journal
[Thu Feb 24 14:04:34 2022] [    825]     0   825    70034     4485    98304        0         -1000 multipathd
[Thu Feb 24 14:04:34 2022] [    868]   102   868    22558      732    81920        0             0 systemd-timesyn
[Thu Feb 24 14:04:34 2022] [    934]   100   934     4604      785    77824        0             0 systemd-network
[Thu Feb 24 14:04:34 2022] [    970]   101   970     6014     1674    86016        0             0 systemd-resolve
[Thu Feb 24 14:04:34 2022] [    996]     0   996     4730      781    57344        0         -1000 systemd-udevd
[Thu Feb 24 14:04:34 2022] [   1111]     0  1111    60258      258   106496        0             0 accounts-daemon
[Thu Feb 24 14:04:34 2022] [   1114]     0  1114     2134      591    53248        0             0 cron
[Thu Feb 24 14:04:34 2022] [   1118]   103  1118     2007      901    49152        0          -900 dbus-daemon
[Thu Feb 24 14:04:34 2022] [   1124]     0  1124    20482      530    57344        0             0 irqbalance
[Thu Feb 24 14:04:34 2022] [   1125]     0  1125     7299     2746    90112        0             0 networkd-dispat
[Thu Feb 24 14:04:34 2022] [   1127]   104  1127    56087      478    86016        0             0 rsyslogd
[Thu Feb 24 14:04:34 2022] [   1134]     0  1134     4210      803    73728        0             0 systemd-logind
[Thu Feb 24 14:04:34 2022] [   1139]     0  1139    98706     1338   126976        0             0 udisksd
[Thu Feb 24 14:04:34 2022] [   1146]     0  1146      948      521    45056        0             0 atd
[Thu Feb 24 14:04:34 2022] [   1191]     0  1191     1838      439    45056        0             0 agetty
[Thu Feb 24 14:04:34 2022] [   1202]     0  1202     1457      386    53248        0             0 agetty
[Thu Feb 24 14:04:34 2022] [   1214]     0  1214    27027     2725   106496        0             0 unattended-upgr
[Thu Feb 24 14:04:34 2022] [   1224]     0  1224    59105      661    90112        0             0 polkitd
[Thu Feb 24 14:04:34 2022] [   1296]     0  1296     3044      834    61440        0         -1000 sshd
[Thu Feb 24 14:04:34 2022] [   5579]   112  5579   999286    25345  1101824        0             0 agent
[Thu Feb 24 14:04:34 2022] [   5580]   112  5580   846820     6224   626688        0             0 process-agent
[Thu Feb 24 14:04:34 2022] [   5581]   112  5581   840052     4468   573440        0             0 trace-agent
[Thu Feb 24 14:04:34 2022] [  15422]   997 15422   374101    19191  3772416        0             0 node
[Thu Feb 24 14:04:34 2022] [2266772]   107 2266772     2437      170    53248        0             0 uuidd
[Thu Feb 24 14:04:34 2022] [3386667]     0 3386667   361977      353   188416        0          -900 droplet-agent
[Thu Feb 24 14:04:34 2022] [ 319732]   997 319732     2187      513    53248        0             0 start.sh
[Thu Feb 24 14:04:34 2022] [ 319736]   997 319736 17187097 16036736 130777088        0             0 geth
[Thu Feb 24 14:04:34 2022] [1027034]     0 1027034   569154     4478   417792        0          -900 snapd
[Thu Feb 24 14:04:34 2022] [2028985]     0 2028985     3453      911    65536        0             0 sshd
[Thu Feb 24 14:04:34 2022] [2028998]     0 2028998     4588     1147    73728        0             0 systemd
[Thu Feb 24 14:04:34 2022] [2029001]     0 2029001    42494     1163   102400        0             0 (sd-pam)
[Thu Feb 24 14:04:34 2022] [2029130]     0 2029130     2540      977    61440        0             0 bash
[Thu Feb 24 14:04:34 2022] [2029718]     0 2029718    25081      748    98304        0             0 curl
[Thu Feb 24 14:04:34 2022] [2029790]     0 2029790     3451      938    61440        0             0 sshd
[Thu Feb 24 14:04:34 2022] [2029869]     0 2029869     2508      900    53248        0             0 bash
[Thu Feb 24 14:04:34 2022] [2029878]     0 2029878     2153      751    57344        0             0 htop
[Thu Feb 24 14:04:34 2022] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/optimism.service,task=geth,pid=319736,uid=997
[Thu Feb 24 14:04:34 2022] Out of memory: Killed process 319736 (geth) total-vm:68748388kB, anon-rss:64146944kB, file-rss:0kB, shmem-rss:0kB, UID:997 pgtables:127712kB oom_score_adj:0
[Thu Feb 24 14:04:37 2022] oom_reaper: reaped process 319736 (geth), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Expected behavior I expect to receive the properly formed json response.

System Specs:

  • OS: Ubuntu 20.06 / 64G RAM / 32core CPU
  • Package Version (or commit hash): @eth-optimism/[email protected]

Additional context Looks like related to geth issue - https://github.com/ethereum/go-ethereum/issues/22244

McSim85 avatar Feb 24 '22 14:02 McSim85

Hey @McSim85. This is most likely because those transactions touch alot of memory and each slot access is being traced by default. You could try disabling memory traces as in this comment? If you need to observe memory accesses, it'll be better to write a custom tracer script that does traces memory accesses more efficiently.

Inphi avatar Feb 24 '22 19:02 Inphi

came across another tx 0x6b25cfeb35d16898b80400b6d3679d76aaca2aa1f9470707b50825ac8502c7ff that's crashing my nodes. I tried with the above mentioned workaround by passing the { "disableMemory": true} but no luck.

Any update on this?

quick-pawiromitchel avatar Jun 13 '22 11:06 quick-pawiromitchel