configurable-http-proxy
configurable-http-proxy copied to clipboard
Socket leak
Bug description
We are running z2jh (https://z2jh.jupyter.org/en/stable/) and found there is a socket leak in proxy pod.
The number of socket is constantly increasing (over 60k) and the kernel generates an error after a week ,kernel: TCP: out of memory -- consider tuning tcp_mem.
I have checked the number of sockets using lsof.
/srv/configurable-http-proxy $ lsof
1 /usr/local/bin/node socket:[48829679]
1 /usr/local/bin/node socket:[48829681]
1 /usr/local/bin/node socket:[48825415]
1 /usr/local/bin/node socket:[48825417]
1 /usr/local/bin/node socket:[48829792]
1 /usr/local/bin/node socket:[48829790]
1 /usr/local/bin/node socket:[48829783]
1 /usr/local/bin/node socket:[48829785]
/srv/configurable-http-proxy $ lsof | wc -l
64708
/srv/configurable-http-proxy $ lsof | wc -l
64719
Your personal set up
- Version(s): Jupyterhub Helm chart: v1.1.2 ( https://jupyterhub.github.io/helm-chart/) This chart use the proxy docker image from jupyterhub/configurable-http-proxy:4.5.0
The config.yaml related to proxy
proxy:
secretToken: xxxx
service:
loadBalancerIP: x.x.x.x
https:
enabled: true
hosts:
- "exmaple.com"
letsencrypt:
contactEmail: "[email protected]"
chp: # proxy pod, running jupyterhub/configurable-http-proxy
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 20
failureThreshold: 10 # retry 10 times before declaring failure
timeoutSeconds: 3
successThreshold: 1
resources:
requests:
cpu: 1000m # 0m - 1000m
memory: 5000Mi # Recommended is 100Mi - 600Mi -- we seem to run at 4.3GB a lot
traefik: # autohttps pod (optional, running traefik/traefik)
resources:
requests:
cpu: 1000m # 0m - 1000m
memory: 512Mi # 100Mi - 1.1Gi
secretSync: # autohttps pod (optional, sidecar container running small Python script)
resources:
requests:
cpu: 10m
memory: 64Mi
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
This sounds similar to https://github.com/jupyterhub/configurable-http-proxy/issues/388
Would you mind adding your comment on that issue and closing this one to avoid duplicates? Thanks!
@manics that looks like a memory leak while this is a socket leak. Should be different no?
We just saw a node with a lot of proxy pods spew a lot of out of memory -- consider tuning tcp_mem while a lot of the proxies stopped working. Could be related?
It seems that a bugfix was made here in a dependency to this project, i wonder if that is relevant!
https://github.com/jupyterhub/configurable-http-proxy/pull/446

ws release is unlikely to be relevant, since it's only used in tests.
I think the node version could be important debugging info because we are relying on node library's http(s) server here:
https://github.com/jupyterhub/configurable-http-proxy/blob/cb03f774e654701f57e7eb337794476561a8f0c2/lib/configproxy.js#L233-L245
| CHP | node | Date |
|---|---|---|
4.6.1 |
v20.10.0 |
Nov 27 2023 |
4.6.0 |
v18.18.2 |
Sep 19 2023 |
4.5.6 |
v18.17.1 |
Aug 10 2023 |
4.5.5 |
v18.15.0 |
Apr 3 2023 |
4.5.4 |
v18.12.1 |
Dec 5 2022 |
4.5.3 |
v16.17.0 |
Sep 9 2022 |
4.5.2 |
v16.17.0 |
Aug 19 2022 |
4.5.1 |
v16.13.2 |
Feb 3 2022 |
4.5.0 |
v14.17.3 |
Jul 19 2021 |
I think this remains a problem in 4.6.1 using node 20 in on GKE with linux kernel 5.15 based on info from @shaneknapp.
CHP has multiple node HTTP servers working in parallell, one for its own REST API, one for proxying, one for metrics.
It would be good to conclude if the growing tcp memory / sockets etc are associated with a specific instance of these.
Looked at one CHP process and saw for example...
# cd /proc/<pid of chp process>
cat net/sockstat
sockets: used 556
TCP: inuse 174 orphan 2 tw 36 alloc 1572 mem 40536
UDP: inuse 0 mem 6
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
This wasn't expected to have anything close to 500 open connections or similar, so I think its very safe to say that this reproduces. This is from latest chp running with node 20 on linux kernel 5.15 nodes.
I'm not sure when I expect a socket to be closed. When it times out based on a "timeout" option? I think the timeout option may be infinite.
Is the issue that there is simply nothing that makes us destroy sockets once created, because we default to an infinite timeout?
@minrk and others, is it safe to change a default for a timeout value here to something quite extreme, like 24 hours? I don't want us to disrupt users that are semi active and runs into issues at the 24th hour - but they wouldn't as long as they are semi-active right?
We have two timeout args matching the node-http-proxy options at https://github.com/http-party/node-http-proxy?tab=readme-ov-file#options, timeout and proxyTimeout - should we try setting them to 24 hours both, or just one?
There is also a related issue reported in node-http-proxy - https://github.com/http-party/node-http-proxy/issues/1510.
The node-http-proxy was foked and had that issue fixed with a one line commit at https://github.com/Jimbly/http-proxy-node16/commit/56283e33edfc7aad8c2605dd493da8a196b4371d, to use a .destroy function instead of a .end function. I figure the difference may be that .end allows for re-use of a process file descriptor perhaps?
Looking closer at that fork, there is also another memory leak fixed in another commit according to the commit message: https://github.com/Jimbly/http-proxy-node16/commit/ba0c414cd03799e357c5d867c11dea06a9c34ec8 . This is detailed in a PR as well: https://github.com/http-party/node-http-proxy/pull/1559
Those two memory fix commits are the only actual fixes in the fork, where the rest is just docs etc.
Maybe we should do a build of chp based on the forked node-http-proxy project and push a tag that users can opt into? Like jupyterhub/configurable-http-proxy:4.6.1-fork image?
I pushed jupyterhub/configurable-http-proxy:4.6.1-fork and quay.io/jupyterhub/configurable-http-proxy:4.6.1-fork (amd64 only, not aarch64 also) where I just updated package.json to reference node-http-proxy16 instead and made it be imported in a .js file (commits seen here: https://github.com/consideRatio/configurable-http-proxy/commits/main/).
If someone wants to try if this fork help, just reference 4.6.1-fork instead as a image tag.
Nice research! I don't think we should have a separate jupyterhub/configurable-http-proxy:4.6.1-fork image, it adds confusion.
If we think the fork is production ready then we should either switch CHP to use it if it's fully trusted, or vendor it if it's not (as previously discussed in https://github.com/jupyterhub/configurable-http-proxy/issues/413#issuecomment-1418900415)
i'll be testing out jupyterhub/configurable-http-proxy:4.6.1-fork in some of our smaller hubs this week and will report back!
edit: here are my changes in case anyone wants to see them!
https://github.com/berkeley-dsep-infra/datahub/pull/5501
If it works, 👍 to just switching the dependency and publishing a new release, without any -fork or other unusual tags.
I think the sustainable longer-term solution is to vendor http2-proxy, which I started here but haven't had time to finish.
It would be great to be able to have some actual tests to exercise these things, since it's been hard to control or verify.
ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.
in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.
🤞
ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.
in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.
🤞
womp womp. that pod has restarted three times in the past hour after deploying the -fork chp:
[Wed Feb 7 20:59:45 2024] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
[Wed Feb 7 20:59:45 2024] Call Trace:
[Wed Feb 7 20:59:45 2024] <TASK>
[Wed Feb 7 20:59:45 2024] dump_stack_lvl+0x4c/0x67
[Wed Feb 7 20:59:45 2024] dump_header+0x53/0x240
[Wed Feb 7 20:59:45 2024] oom_kill_process+0x10e/0x1d0
[Wed Feb 7 20:59:45 2024] out_of_memory+0x496/0x5a0
[Wed Feb 7 20:59:45 2024] ? mem_cgroup_iter+0x213/0x280
[Wed Feb 7 20:59:45 2024] try_charge_memcg+0x7dd/0x9f0
[Wed Feb 7 20:59:45 2024] charge_memcg+0x42/0x1a0
[Wed Feb 7 20:59:45 2024] __mem_cgroup_charge+0x2d/0x80
[Wed Feb 7 20:59:45 2024] handle_mm_fault+0x1088/0x15d0
[Wed Feb 7 20:59:45 2024] do_user_addr_fault+0x279/0x4f0
[Wed Feb 7 20:59:45 2024] exc_page_fault+0x78/0xf0
[Wed Feb 7 20:59:45 2024] asm_exc_page_fault+0x22/0x30
[Wed Feb 7 20:59:45 2024] RIP: 0033:0x7e707bb07081
[Wed Feb 7 20:59:45 2024] Code: 48 ff c1 eb f3 c3 48 89 f8 48 83 fa 08 72 14 f7 c7 07 00 00 00 74 0c a4 48 ff ca f7 c7 07 00 00 00 75 f4 48 89 d1 48 c1 e9 03 <f3> 48 a5 83 e2 07 74 05 a4 ff ca 75 fb c3 48 89 f8 48 29 f0 48 39
[Wed Feb 7 20:59:45 2024] RSP: 002b:00007e707af865a8 EFLAGS: 00010202
[Wed Feb 7 20:59:45 2024] RAX: 000000fbf7993c48 RBX: 0000000000000044 RCX: 0000000000000050
[Wed Feb 7 20:59:45 2024] RDX: 0000000000000638 RSI: 000027b131f9cc38 RDI: 000000fbf7994000
[Wed Feb 7 20:59:45 2024] RBP: 00007e707af86640 R08: 000000fbf7993c48 R09: 000000fbf7993c40
[Wed Feb 7 20:59:45 2024] R10: 000000fbf7993c41 R11: 000027b131f9c879 R12: 00001fe7f33411c9
[Wed Feb 7 20:59:45 2024] R13: 000000fbf798dd50 R14: 00007e70760e2ed0 R15: 0000000000000640
[Wed Feb 7 20:59:45 2024] </TASK>
[Wed Feb 7 20:59:45 2024] memory: usage 1048484kB, limit 1048576kB, failcnt 209
[Wed Feb 7 20:59:45 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Wed Feb 7 20:59:45 2024] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice:
[Wed Feb 7 20:59:45 2024] anon 982282240
file 0
kernel_stack 245760
pagetables 28696576
percpu 46240
sock 442368
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 982249472
active_anon 8192
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 16552048
slab_unreclaimable 44657080
slab 61209128
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 4469590
pgmajfault 0
pgrefill 0
pgscan 0
pgsteal 0
pgactivate 1
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 0
thp_collapse_alloc 0
[Wed Feb 7 20:59:45 2024] Tasks state (memory values in pages):
[Wed Feb 7 20:59:45 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Wed Feb 7 20:59:45 2024] [3783484] 65535 3783484 257 1 28672 0 -998 pause
[Wed Feb 7 20:59:45 2024] [3825947] 65534 3825947 410957 250184 29212672 0 999 node
[Wed Feb 7 20:59:45 2024] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice/cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,task=node,pid=3825947,uid=65534
[Wed Feb 7 20:59:45 2024] Memory cgroup out of memory: Killed process 3825947 (node) total-vm:1643828kB, anon-rss:958732kB, file-rss:42004kB, shmem-rss:0kB, UID:65534 pgtables:28528kB oom_score_adj:999
i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:
[Wed Feb 7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb 7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00
10.20.3.8 is an nginx-ingress-controller.
and sometimes after it's killed (but not every time) we get the following in dmesg:
[Wed Feb 7 21:36:18 2024] TCP: request_sock_TCP: Possible SYN flooding on port 8000. Sending cookies. Check SNMP counters.
are we DOSing ourselves somehow?
i'm also seeing a lot of these
martian sourcemessages in dmesg with this hub's IP:[Wed Feb 7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0 [Wed Feb 7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00
10.20.3.8is an nginx-ingress-controller.are we DOSing ourselves somehow?
i killed that pod and the martian source messages have stopped... lets see if this helps things.
so. confusing.
i'm also seeing a lot of these
martian sourcemessages in dmesg with this hub's IP:[Wed Feb 7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0 [Wed Feb 7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00
10.20.3.8is an nginx-ingress-controller. are we DOSing ourselves somehow?i killed that pod and the
martian sourcemessages have stopped... lets see if this helps things.so. confusing.
nope. that pod's chp is still getting OOMKilled. time for a break. :)
What is its normal memory use and what its k8s request/limit in memory?
chp pod is set to have 1Gi ram as the limit and request is 64Mi. it's definitely hitting the limit pretty regularly. edit: we expect normal usage to be ~500Mi.
there's also a corresponding cpu spike when it OOMs:
there are ~280 logged in to the hub currently.
here's what overall tcp mem usage for the core node (as reported from /proc/net/sockstat) looks like:
you can see the overall memory used drop significantly each time i deploy the 4.6.1-fork chp to an existing hub.
all hub and proxy pods for any given hub are all on this node. it's a GCP n2-highmem-8.
i've reverted 4.6.1-fork on that node. it was causing many users to get 503s, which happen during the chp restarting. a quick reload bring their notebooks back but it was seriously disruptive.
that node is still regularly hitting the max memory, but just not as quickly(?) as with 4.6.1-fork. i think.
~we don't see this behavior on other chp deployments~
i looked at chp pod restarts across our hubs. we're seeing this intermittently across our biggest deployments (both 4.6.1 and 4.6.1-fork... which all run on the same core node).
the smaller usage hub's chps usually run between ~200-300MiB (~200 users max at any time, no massive spikes in logins etc).
the larger hubs run ~400+ MiB but depending on user count, the chp pods eventually run out of memory and are OOMKilled.
today was spent looking in to our largest class, but the other two had definitely been experiencing the same issue to slightly lesser impact. now that i know what to look for, i'll keep an eye on these individual pods' memory usage and know better what to look for in the logs. the biggest takeaway so far is that tcp_mem is king, and if it wasn't for that we'd have had at least one big outage over the past few weeks. throwing ram at the problem isn't always the right solution, but at least we've been able to winnow out some of the impacts of heavy hub usage (and hopefully get some bugs fixed!). :)
i also suspect the martian source and SYN_FLOOD errors are excellent breadcrumbs leading to bad behavior somewhere(s).
disclaimer: i have a few tabs open and suspect that the former might be something isn't cleaning up when something in the proxy stack goes away. the latter might just be a symptom of this, but i can neither confirm nor deny.
https://en.wikipedia.org/wiki/Martian_packet
so martians are involved somehow, causing network flooding and a DOS?
current state of the most hub (~175+ users)...
it's been an afternoon, that's for sure. ;)
@consideRatio i've rolled back the 11 hubs that had 4.6.1-fork deployed. it seems like when we had about 200 or more users logged in, the proxy pod's memory would spike and eventually get OOMKilled and restart the configurable-http-proxy process on the pod.
after rolling back to 4.6.1 i can see that the proxy pod's memory usage has much smoothed out compared to before.
le sigh. :(
fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days.
This is just regular 4.6.1
fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days. This is just regular 4.6.1
i looked at our historical data, and the big hub in question yesterday typically requested ~750MiB at peak-usage times over the past 30 days, but 4.6.1-fork spiked to over 1GiB almost immediately: