configurable-http-proxy icon indicating copy to clipboard operation
configurable-http-proxy copied to clipboard

Socket leak

Open jyounggo opened this issue 3 years ago • 43 comments

Bug description

We are running z2jh (https://z2jh.jupyter.org/en/stable/) and found there is a socket leak in proxy pod. The number of socket is constantly increasing (over 60k) and the kernel generates an error after a week ,kernel: TCP: out of memory -- consider tuning tcp_mem.

I have checked the number of sockets using lsof.

/srv/configurable-http-proxy $ lsof
1	/usr/local/bin/node	socket:[48829679]
1	/usr/local/bin/node	socket:[48829681]
1	/usr/local/bin/node	socket:[48825415]
1	/usr/local/bin/node	socket:[48825417]
1	/usr/local/bin/node	socket:[48829792]
1	/usr/local/bin/node	socket:[48829790]
1	/usr/local/bin/node	socket:[48829783]
1	/usr/local/bin/node	socket:[48829785]
/srv/configurable-http-proxy $ lsof | wc -l
64708
/srv/configurable-http-proxy $ lsof | wc -l
64719

Your personal set up

  • Version(s): Jupyterhub Helm chart: v1.1.2 ( https://jupyterhub.github.io/helm-chart/) This chart use the proxy docker image from jupyterhub/configurable-http-proxy:4.5.0

The config.yaml related to proxy

proxy:
  secretToken: xxxx

  service:
    loadBalancerIP: x.x.x.x
  https:
    enabled: true
    hosts:
     - "exmaple.com"
    letsencrypt:
      contactEmail: "[email protected]"
  chp: # proxy pod, running jupyterhub/configurable-http-proxy
    livenessProbe:
      enabled: true
      initialDelaySeconds: 60
      periodSeconds: 20
      failureThreshold: 10 # retry 10 times before declaring failure
      timeoutSeconds: 3
      successThreshold: 1
    resources:
      requests:
        cpu: 1000m # 0m - 1000m
        memory: 5000Mi # Recommended is 100Mi - 600Mi -- we seem to run at 4.3GB a lot
  traefik: # autohttps pod (optional, running traefik/traefik)
    resources:
      requests:
        cpu: 1000m # 0m - 1000m
        memory: 512Mi # 100Mi - 1.1Gi
  secretSync: # autohttps pod (optional, sidecar container running small Python script)
    resources:
      requests:
        cpu: 10m
        memory: 64Mi

jyounggo avatar Oct 12 '22 21:10 jyounggo

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Oct 12 '22 21:10 welcome[bot]

This sounds similar to https://github.com/jupyterhub/configurable-http-proxy/issues/388

Would you mind adding your comment on that issue and closing this one to avoid duplicates? Thanks!

manics avatar Oct 12 '22 22:10 manics

@manics that looks like a memory leak while this is a socket leak. Should be different no?

yuvipanda avatar Oct 31 '22 17:10 yuvipanda

We just saw a node with a lot of proxy pods spew a lot of out of memory -- consider tuning tcp_mem while a lot of the proxies stopped working. Could be related?

yuvipanda avatar Oct 31 '22 17:10 yuvipanda

It seems that a bugfix was made here in a dependency to this project, i wonder if that is relevant!

https://github.com/jupyterhub/configurable-http-proxy/pull/446

Screenshot_20221107-091619

consideRatio avatar Nov 07 '22 08:11 consideRatio

ws release is unlikely to be relevant, since it's only used in tests.

minrk avatar Feb 06 '23 08:02 minrk

I think the node version could be important debugging info because we are relying on node library's http(s) server here:

https://github.com/jupyterhub/configurable-http-proxy/blob/cb03f774e654701f57e7eb337794476561a8f0c2/lib/configproxy.js#L233-L245

CHP node Date
4.6.1 v20.10.0 Nov 27 2023
4.6.0 v18.18.2 Sep 19 2023
4.5.6 v18.17.1 Aug 10 2023
4.5.5 v18.15.0 Apr 3 2023
4.5.4 v18.12.1 Dec 5 2022
4.5.3 v16.17.0 Sep 9 2022
4.5.2 v16.17.0 Aug 19 2022
4.5.1 v16.13.2 Feb 3 2022
4.5.0 v14.17.3 Jul 19 2021

consideRatio avatar Nov 27 '23 13:11 consideRatio

I think this remains a problem in 4.6.1 using node 20 in on GKE with linux kernel 5.15 based on info from @shaneknapp.

consideRatio avatar Feb 02 '24 14:02 consideRatio

CHP has multiple node HTTP servers working in parallell, one for its own REST API, one for proxying, one for metrics.

It would be good to conclude if the growing tcp memory / sockets etc are associated with a specific instance of these.

consideRatio avatar Feb 02 '24 15:02 consideRatio

Looked at one CHP process and saw for example...

# cd /proc/<pid of chp process>
cat net/sockstat
sockets: used 556
TCP: inuse 174 orphan 2 tw 36 alloc 1572 mem 40536
UDP: inuse 0 mem 6
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

This wasn't expected to have anything close to 500 open connections or similar, so I think its very safe to say that this reproduces. This is from latest chp running with node 20 on linux kernel 5.15 nodes.

consideRatio avatar Feb 02 '24 16:02 consideRatio

I'm not sure when I expect a socket to be closed. When it times out based on a "timeout" option? I think the timeout option may be infinite.

Is the issue that there is simply nothing that makes us destroy sockets once created, because we default to an infinite timeout?

consideRatio avatar Feb 02 '24 16:02 consideRatio

@minrk and others, is it safe to change a default for a timeout value here to something quite extreme, like 24 hours? I don't want us to disrupt users that are semi active and runs into issues at the 24th hour - but they wouldn't as long as they are semi-active right?

We have two timeout args matching the node-http-proxy options at https://github.com/http-party/node-http-proxy?tab=readme-ov-file#options, timeout and proxyTimeout - should we try setting them to 24 hours both, or just one?

consideRatio avatar Feb 02 '24 16:02 consideRatio

There is also a related issue reported in node-http-proxy - https://github.com/http-party/node-http-proxy/issues/1510.

The node-http-proxy was foked and had that issue fixed with a one line commit at https://github.com/Jimbly/http-proxy-node16/commit/56283e33edfc7aad8c2605dd493da8a196b4371d, to use a .destroy function instead of a .end function. I figure the difference may be that .end allows for re-use of a process file descriptor perhaps?

Looking closer at that fork, there is also another memory leak fixed in another commit according to the commit message: https://github.com/Jimbly/http-proxy-node16/commit/ba0c414cd03799e357c5d867c11dea06a9c34ec8 . This is detailed in a PR as well: https://github.com/http-party/node-http-proxy/pull/1559

Those two memory fix commits are the only actual fixes in the fork, where the rest is just docs etc.

Maybe we should do a build of chp based on the forked node-http-proxy project and push a tag that users can opt into? Like jupyterhub/configurable-http-proxy:4.6.1-fork image?

consideRatio avatar Feb 02 '24 16:02 consideRatio

I pushed jupyterhub/configurable-http-proxy:4.6.1-fork and quay.io/jupyterhub/configurable-http-proxy:4.6.1-fork (amd64 only, not aarch64 also) where I just updated package.json to reference node-http-proxy16 instead and made it be imported in a .js file (commits seen here: https://github.com/consideRatio/configurable-http-proxy/commits/main/).

If someone wants to try if this fork help, just reference 4.6.1-fork instead as a image tag.

consideRatio avatar Feb 02 '24 17:02 consideRatio

Nice research! I don't think we should have a separate jupyterhub/configurable-http-proxy:4.6.1-fork image, it adds confusion.

If we think the fork is production ready then we should either switch CHP to use it if it's fully trusted, or vendor it if it's not (as previously discussed in https://github.com/jupyterhub/configurable-http-proxy/issues/413#issuecomment-1418900415)

manics avatar Feb 03 '24 15:02 manics

i'll be testing out jupyterhub/configurable-http-proxy:4.6.1-fork in some of our smaller hubs this week and will report back!

edit: here are my changes in case anyone wants to see them!

https://github.com/berkeley-dsep-infra/datahub/pull/5501

shaneknapp avatar Feb 05 '24 01:02 shaneknapp

If it works, 👍 to just switching the dependency and publishing a new release, without any -fork or other unusual tags.

I think the sustainable longer-term solution is to vendor http2-proxy, which I started here but haven't had time to finish.

It would be great to be able to have some actual tests to exercise these things, since it's been hard to control or verify.

minrk avatar Feb 05 '24 13:02 minrk

ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.

in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.

🤞

shaneknapp avatar Feb 07 '24 20:02 shaneknapp

ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.

in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.

🤞

womp womp. that pod has restarted three times in the past hour after deploying the -fork chp:

[Wed Feb  7 20:59:45 2024] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
[Wed Feb  7 20:59:45 2024] Call Trace:
[Wed Feb  7 20:59:45 2024]  <TASK>
[Wed Feb  7 20:59:45 2024]  dump_stack_lvl+0x4c/0x67
[Wed Feb  7 20:59:45 2024]  dump_header+0x53/0x240
[Wed Feb  7 20:59:45 2024]  oom_kill_process+0x10e/0x1d0
[Wed Feb  7 20:59:45 2024]  out_of_memory+0x496/0x5a0
[Wed Feb  7 20:59:45 2024]  ? mem_cgroup_iter+0x213/0x280
[Wed Feb  7 20:59:45 2024]  try_charge_memcg+0x7dd/0x9f0
[Wed Feb  7 20:59:45 2024]  charge_memcg+0x42/0x1a0
[Wed Feb  7 20:59:45 2024]  __mem_cgroup_charge+0x2d/0x80
[Wed Feb  7 20:59:45 2024]  handle_mm_fault+0x1088/0x15d0
[Wed Feb  7 20:59:45 2024]  do_user_addr_fault+0x279/0x4f0
[Wed Feb  7 20:59:45 2024]  exc_page_fault+0x78/0xf0
[Wed Feb  7 20:59:45 2024]  asm_exc_page_fault+0x22/0x30
[Wed Feb  7 20:59:45 2024] RIP: 0033:0x7e707bb07081
[Wed Feb  7 20:59:45 2024] Code: 48 ff c1 eb f3 c3 48 89 f8 48 83 fa 08 72 14 f7 c7 07 00 00 00 74 0c a4 48 ff ca f7 c7 07 00 00 00 75 f4 48 89 d1 48 c1 e9 03 <f3> 48 a5 83 e2 07 74 05 a4 ff ca 75 fb c3 48 89 f8 48 29 f0 48 39
[Wed Feb  7 20:59:45 2024] RSP: 002b:00007e707af865a8 EFLAGS: 00010202
[Wed Feb  7 20:59:45 2024] RAX: 000000fbf7993c48 RBX: 0000000000000044 RCX: 0000000000000050
[Wed Feb  7 20:59:45 2024] RDX: 0000000000000638 RSI: 000027b131f9cc38 RDI: 000000fbf7994000
[Wed Feb  7 20:59:45 2024] RBP: 00007e707af86640 R08: 000000fbf7993c48 R09: 000000fbf7993c40
[Wed Feb  7 20:59:45 2024] R10: 000000fbf7993c41 R11: 000027b131f9c879 R12: 00001fe7f33411c9
[Wed Feb  7 20:59:45 2024] R13: 000000fbf798dd50 R14: 00007e70760e2ed0 R15: 0000000000000640
[Wed Feb  7 20:59:45 2024]  </TASK>
[Wed Feb  7 20:59:45 2024] memory: usage 1048484kB, limit 1048576kB, failcnt 209
[Wed Feb  7 20:59:45 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Wed Feb  7 20:59:45 2024] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice:
[Wed Feb  7 20:59:45 2024] anon 982282240
                           file 0
                           kernel_stack 245760
                           pagetables 28696576
                           percpu 46240
                           sock 442368
                           shmem 0
                           file_mapped 0
                           file_dirty 0
                           file_writeback 0
                           swapcached 0
                           anon_thp 0
                           file_thp 0
                           shmem_thp 0
                           inactive_anon 982249472
                           active_anon 8192
                           inactive_file 0
                           active_file 0
                           unevictable 0
                           slab_reclaimable 16552048
                           slab_unreclaimable 44657080
                           slab 61209128
                           workingset_refault_anon 0
                           workingset_refault_file 0
                           workingset_activate_anon 0
                           workingset_activate_file 0
                           workingset_restore_anon 0
                           workingset_restore_file 0
                           workingset_nodereclaim 0
                           pgfault 4469590
                           pgmajfault 0
                           pgrefill 0
                           pgscan 0
                           pgsteal 0
                           pgactivate 1
                           pgdeactivate 0
                           pglazyfree 0
                           pglazyfreed 0
                           thp_fault_alloc 0
                           thp_collapse_alloc 0
[Wed Feb  7 20:59:45 2024] Tasks state (memory values in pages):
[Wed Feb  7 20:59:45 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Wed Feb  7 20:59:45 2024] [3783484] 65535 3783484      257        1    28672        0          -998 pause
[Wed Feb  7 20:59:45 2024] [3825947] 65534 3825947   410957   250184 29212672        0           999 node
[Wed Feb  7 20:59:45 2024] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice/cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,task=node,pid=3825947,uid=65534
[Wed Feb  7 20:59:45 2024] Memory cgroup out of memory: Killed process 3825947 (node) total-vm:1643828kB, anon-rss:958732kB, file-rss:42004kB, shmem-rss:0kB, UID:65534 pgtables:28528kB oom_score_adj:999

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller.

and sometimes after it's killed (but not every time) we get the following in dmesg:

[Wed Feb  7 21:36:18 2024] TCP: request_sock_TCP: Possible SYN flooding on port 8000. Sending cookies.  Check SNMP counters.

are we DOSing ourselves somehow?

shaneknapp avatar Feb 07 '24 21:02 shaneknapp

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller.

are we DOSing ourselves somehow?

i killed that pod and the martian source messages have stopped... lets see if this helps things.

so. confusing.

shaneknapp avatar Feb 07 '24 21:02 shaneknapp

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller. are we DOSing ourselves somehow?

i killed that pod and the martian source messages have stopped... lets see if this helps things.

so. confusing.

nope. that pod's chp is still getting OOMKilled. time for a break. :)

shaneknapp avatar Feb 07 '24 21:02 shaneknapp

What is its normal memory use and what its k8s request/limit in memory?

consideRatio avatar Feb 07 '24 22:02 consideRatio

chp pod is set to have 1Gi ram as the limit and request is 64Mi. it's definitely hitting the limit pretty regularly. edit: we expect normal usage to be ~500Mi.

there's also a corresponding cpu spike when it OOMs: image

there are ~280 logged in to the hub currently.

shaneknapp avatar Feb 07 '24 22:02 shaneknapp

here's what overall tcp mem usage for the core node (as reported from /proc/net/sockstat) looks like: image

you can see the overall memory used drop significantly each time i deploy the 4.6.1-fork chp to an existing hub.

all hub and proxy pods for any given hub are all on this node. it's a GCP n2-highmem-8.

shaneknapp avatar Feb 07 '24 22:02 shaneknapp

i've reverted 4.6.1-fork on that node. it was causing many users to get 503s, which happen during the chp restarting. a quick reload bring their notebooks back but it was seriously disruptive.

that node is still regularly hitting the max memory, but just not as quickly(?) as with 4.6.1-fork. i think.

~we don't see this behavior on other chp deployments~

i looked at chp pod restarts across our hubs. we're seeing this intermittently across our biggest deployments (both 4.6.1 and 4.6.1-fork... which all run on the same core node).

the smaller usage hub's chps usually run between ~200-300MiB (~200 users max at any time, no massive spikes in logins etc).

the larger hubs run ~400+ MiB but depending on user count, the chp pods eventually run out of memory and are OOMKilled.

today was spent looking in to our largest class, but the other two had definitely been experiencing the same issue to slightly lesser impact. now that i know what to look for, i'll keep an eye on these individual pods' memory usage and know better what to look for in the logs. the biggest takeaway so far is that tcp_mem is king, and if it wasn't for that we'd have had at least one big outage over the past few weeks. throwing ram at the problem isn't always the right solution, but at least we've been able to winnow out some of the impacts of heavy hub usage (and hopefully get some bugs fixed!). :)

shaneknapp avatar Feb 07 '24 23:02 shaneknapp

i also suspect the martian source and SYN_FLOOD errors are excellent breadcrumbs leading to bad behavior somewhere(s).

disclaimer: i have a few tabs open and suspect that the former might be something isn't cleaning up when something in the proxy stack goes away. the latter might just be a symptom of this, but i can neither confirm nor deny.

https://en.wikipedia.org/wiki/Martian_packet

so martians are involved somehow, causing network flooding and a DOS?

shaneknapp avatar Feb 08 '24 01:02 shaneknapp

current state of the most hub (~175+ users)...
image

it's been an afternoon, that's for sure. ;)

shaneknapp avatar Feb 08 '24 01:02 shaneknapp

@consideRatio i've rolled back the 11 hubs that had 4.6.1-fork deployed. it seems like when we had about 200 or more users logged in, the proxy pod's memory would spike and eventually get OOMKilled and restart the configurable-http-proxy process on the pod.

after rolling back to 4.6.1 i can see that the proxy pod's memory usage has much smoothed out compared to before. image

le sigh. :(

shaneknapp avatar Feb 08 '24 21:02 shaneknapp

fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days.

image

This is just regular 4.6.1

yuvipanda avatar Feb 08 '24 21:02 yuvipanda

fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days. This is just regular 4.6.1

i looked at our historical data, and the big hub in question yesterday typically requested ~750MiB at peak-usage times over the past 30 days, but 4.6.1-fork spiked to over 1GiB almost immediately: image

shaneknapp avatar Feb 08 '24 21:02 shaneknapp