linkerd2-proxy-init icon indicating copy to clipboard operation
linkerd2-proxy-init copied to clipboard

BUG: fix race condition

Open sdickhoven opened this issue 11 months ago β€’ 8 comments

this pr fixes a race condition that occurs when new cni config files are created while the install-cni.sh script is executing anywhere here:

https://github.com/linkerd/linkerd2-proxy-init/blob/cni-plugin/v1.6.0/cni-plugin/deployment/scripts/install-cni.sh#L323-L348

we have observed this race condition several times in our eks clusters over the past few days where we are chaining cilium to the aws vpc cni.

the install-cni.sh script simply fails to patch the cilium cni config sometimes. i.e. cilium and linkerd-cni must be starting up and manipulating /etc/cni/net.d at just about the same time.

i have a temporary workaround for this race condition in the form of the following kustomize patch:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: linkerd-cni
resources:
- manifest.yaml
patches:
- patch: |
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: linkerd-cni
      namespace: linkerd-cni
    spec:
      template:
        spec:
          containers:
          - name: install-cni
            lifecycle:
              postStart:
                exec:
                  command:
                  - /bin/sh
                  - -c
                  - |
                    sleep 10 # wait for `install-cni.sh` to enter into `inotifywait` loop
                    exec &> /tmp/cni_config_repair.out
                    set -x
                    HOST_CNI_NET="${CONTAINER_MOUNT_PREFIX:-/host}${DEST_CNI_NET_DIR:-/etc/cni/net.d}"
                    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
                    while read -r -d $'\0' file
                    do
                      sleep 2 # try to avoid interfering with non-atomic filesystem operations
                      grep -qw linkerd "${file}" && continue || echo "Found unpatched CNI config file: ${file}"
                      tmp_file="$(mktemp -ut linkerd-cni.XXXXXX)"
                      cp -fp "${file}" "${tmp_file}"
                      echo >> "${tmp_file}" # force hash diff
                      mv -f "${tmp_file}" "${file}" # IMPORTANT: use atomic filesystem operation!
                    done

sdickhoven avatar Feb 15 '25 03:02 sdickhoven

hi @alpeb πŸ‘‹

i have taken out the unrelated changes. βœ…

happy to submit another pr for those unrelated changes but they are 100% cosmetic and 0% functional. so there's really no need.

sdickhoven avatar Feb 20 '25 15:02 sdickhoven

by the way, i just thought of a way in which install-cni.sh can go into an infinite patching loop.

specifically, the current sha256sum logic assumes that there is only ever going to be one cni config file.

picture this scenario: two new cni config files are created at very nearly the same time.

let's call them

  • cni1.conflist
  • cni2.conflist

sync() updates cni1.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni1.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni2.conflist...

sync() updates cni2.conflist and sets cni_conf_sha to the sha sum for that file.

moving the updated cni2.conflist file into place triggers another inotifywait event.

but before we can deal with that inotifywait event, we first have to deal with the one that is already waiting in the queue for cni1.conflist...

...and since cni_conf_sha now contains the sha sum of cni2.conflist, sync() will not detect that it has already updated cni1.conflist...

so the cycle repeats... forever.

should i fix this behavior in this pr or create a new pr?

it is technically a different error case (not a race condition but an infinite loop).

i would address this by updating the sha sum logic to store all sha sums in a variable so that the sha sum for the correct file can be looked up.

if you want me to submit a separate pr, i'm going to wait until this pr is merged... because the required changes would be different for pre-/post-merge install-cni.sh scripts.

sdickhoven avatar Feb 21 '25 20:02 sdickhoven

this is what my fix for the infinite patching loop would look like (using this pr as base):

diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index cdf166c..b6156d7 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -264,14 +264,16 @@ sync() {
 
 # monitor_cni_config starts a watch on the host's CNI config directory
 monitor_cni_config() {
+  local new_sha
   inotifywait -m "${HOST_CNI_NET}" -e create,moved_to,modify |
     while read -r directory action filename; do
       if [[ "$filename" =~ .*.(conflist|conf)$ ]]; then 
         log "Detected change in $directory: $action $filename"
-        sync "$filename" "$action" "$cni_conf_sha"
+        sync "$filename" "$action" "$(jq -r --arg file "$filename" '.[$file] | select(.)' <<< "$cni_conf_sha")"
         # calculate file SHA to use in the next iteration
         if [[ -e "$directory/$filename" ]]; then
-          cni_conf_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          new_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
+          cni_conf_sha="$(jq -c --arg file "$filename" --arg sha "$new_sha" '. * {$file: $sha}' <<< "$cni_conf_sha")"
         fi
       fi
     done
@@ -315,7 +317,7 @@ install_cni_bin
 # Otherwise, new CNI config files can be created just _after_ the initial round
 # of patching and just _before_ we set up the `inotifywait` loop to detect new
 # CNI config files.
-cni_conf_sha="__init__"
+cni_conf_sha='{}'
 monitor_cni_config &
 
 # Append our config to any existing config file (*.conflist or *.conf)

this will maintain cni config sha hashes in a json object like this:

{
  "05-cilium.conflist" : "0a08ee0b9360e2ee2c3ed1d83263a3168832101346d0528a2474c3f80b7c73d6",
  "10-aws.conflist"    : "7ed380c9100362003cde9861cc6ef09307245eba3ea963cdba36186a30284acd"
}

the select(.) rejects null as output.

since the string null is not a valid sha sum, it's not really necessary to do this but seems like the right thing to do. 🀷

$ filename=abc
$ jq -r --arg file "$filename" '.[$file]' <<< '{}'
null
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{}'
$ jq -r --arg file "$filename" '.[$file] | select(.)' <<< '{"abc":"def"}'
def

also <<< only works with bash, not sh. if you prefer we can change

... <<< "$cni_conf_sha"

to

... < <(echo "$cni_conf_sha")

or

echo "$cni_conf_sha" | ...

sdickhoven avatar Feb 22 '25 15:02 sdickhoven

Sorry for the late reply... As you correctly realized, this script was designed supposing the existence of a single cni config file. I'd like to spend more time investigating how linkerd-cni should behave in the presence of multiple such files before evaluating your solution. So let's for now leave your last diff posted as a comment out of the current PR. I'll give another look and round of tests to the current commits in the following days, so we can move forward.

alpeb avatar Mar 05 '25 23:03 alpeb

Testing this results in the following output:

$ k -n linkerd-cni logs -f linkerd-cni-sq6cf
[2025-03-07 15:29:23] Wrote linkerd CNI binaries to /host/opt/cni/bin
Setting up watches.
Watches established.
[2025-03-07 15:29:23] Trigger CNI config detection for /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
Setting up watches.
Watches established.
[2025-03-07 15:29:23] New/changed file [10-calico.conflist] detected; re-installing
[2025-03-07 15:29:23] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-03-07 15:29:23] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "debug",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190"],
    "simulate": false,
    "use-wait-flag": true,
    "iptables-mode": "legacy",
    "ipv6": false
  }
}
[2025-03-07 15:29:23] Created CNI config /host/etc/cni/net.d/10-calico.conflist
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: CREATE 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: CREATE /host/etc/cni/net.d/10-calico.conflist; no real changes detected
[2025-03-07 15:29:23] Detected change in /host/etc/cni/net.d/: MODIFY 10-calico.conflist
[2025-03-07 15:29:23] Ignoring event: MODIFY /host/etc/cni/net.d/10-calico.conflist; no real changes detected

Those extra events at the bottom are new with this change. Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

alpeb avatar Mar 07 '25 15:03 alpeb

Given that we're now calling monitor_cni_config & earlier, the entire block that follows it should no longer be necessary, right?

which block are you referring to?

this?:

# Append our config to any existing config file (*.conflist or *.conf)
config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \))
if [ -z "$config_files" ]; then
    log "No active CNI configuration files found"
else
  config_file_count=$(echo "$config_files" | grep -v linkerd | sort | wc -l)
  if [ "$config_file_count" -eq 0 ]; then
    log "No active CNI configuration files found"
  else
    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
      while read -r -d $'\0' file; do
        log "Installing CNI configuration for $file"
...

this block is definitely needed!

because cni config files can already exist before install-cni.sh even starts up. and you would never get an event from inotifywait for those files that already exist in /etc/cni/net.d.

the logic is:

  1. set up inotifywait
  2. force any already existing files to trigger inotifywait

i guess that the change in the block that follows monitor_cni_config & is not strictly necessary (because all that really matters is that the script dosn't miss new cni config files).

but i figured that consolidating the patching logic made sense.

does that answer your question? apologies if i did not understand your question correctly.

the log output looks exactly like what i would expect from my changes.

sdickhoven avatar Mar 07 '25 16:03 sdickhoven

oh wait... i think i understand what you're asking...

Those extra events at the bottom are new with this change.

you are right. i would not expect those last two events. however, i thought that this was maybe your ci check making sure that trying to update the same file twice would not lead to double-patching. 🀷

if that's not the case then you are correct: those last two events shouldn't happen and i'm not sure where they would be coming from. πŸ˜•

...but only the last two. the two events before the last two are expected and a result of this patch. yes.

before this change, monitor_cni_config was not already running when install-cni.sh started patching existing cni config files.

so there wouldn't have been an inotifywait event associated with that activity.

now, when install-cni.sh patches an existing cni config file, inotifywait is already watching for new/modified files.

so the act of patching will trigger inotifywait. that part is new.

so:

1st inotifywatch event comes from:

log "Trigger CNI config detection for $file"
...
mv -f "$tmp_file" "$file"

2nd inotifywait event comes from install_cni_conf actually patching that file and moving the patched file into place:

https://github.com/linkerd/linkerd2-proxy-init/blob/proxy-init/v2.4.2/cni-plugin/deployment/scripts/install-cni.sh#L219

both of these events did not show up before because monitor_cni_config was not running yet (and was not used to accomplish the actual patching).

but, as you can see, the second event is ignored (as expected/intended).

but then there are two more CREATE/MODIFY events at the very end that i can't explain... again, i thought this was a ci check making sure that the shasum logic is doing the right thing. 🀷

sdickhoven avatar Mar 07 '25 17:03 sdickhoven

i hope this answers your question. please let me know if i'm not making sense.

sdickhoven avatar Mar 07 '25 17:03 sdickhoven

any progress on this?

cni chaining is kind of a thing and it would be good for linkerd to work correctly in environments where cni chaining is already in place... i.e. where linkerd isn't the only chained cni plugin.

chaining cilium to aws vpc cni is definitely a pretty common setup based on what i see in the cilium slack workspace.

let me know if there's anything else i can do to help get this very real race condition fixed.

sdickhoven avatar May 18 '25 17:05 sdickhoven

Hi, thanks for pinging back; looking again into this... You're right about requiring the logic to be triggered for the initial file. Quick question: is there a more atomic way to trigger that than having the copy followed by the move? Would just calling touch on the file trigger the sync? (that'll likely require adding another event to the list of inotifywait events).

As a side note, I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense... but that approach will have to wait till it gets picked up by the major cloud providers. Perhaps we could have users opt into that behavior. Not asking you to implement any of that, just wanted to share what I found out :slightly_smiling_face:

alpeb avatar May 26 '25 22:05 alpeb

is there a more atomic way to trigger that than having the copy followed by the move?

yes. good point. touching a file would be cleaner indeed. something like this should work:

root@5a06a8dd2c88:~# inotifywait -m . -e moved_to,close_write &
[1] 211
Setting up watches.
Watches established.

root@5a06a8dd2c88:~# touch ttt
./ CLOSE_WRITE,CLOSE ttt
root@5a06a8dd2c88:~# echo > abc
./ CLOSE_WRITE,CLOSE abc
root@5a06a8dd2c88:~# cat ttt
root@5a06a8dd2c88:~# echo > def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch def
./ CLOSE_WRITE,CLOSE def
root@5a06a8dd2c88:~# touch /tmp/ttt
root@5a06a8dd2c88:~# mv /tmp/ttt .
./ MOVED_TO ttt
root@5a06a8dd2c88:~# rm ttt
root@5a06a8dd2c88:~#

I've just realized libcni recently released a way of doing safe subdirectory-based plugin config loading, which would save us from all this config patching nonsense

😍 that's great! i hope it becomes available soon. that would certainly make things easier.

do you want me to update my pr with the touch logic (which requires an update the inotifywait events we want to watch for)?

also, do you want me to open another pr for the other race condition i pointed out about the infinite patching loop due to the flawed shasum logic?

if so, then i'd want to wait until this pr is merged before i submit the second pr.

sdickhoven avatar May 27 '25 02:05 sdickhoven

Yes, please update the PR with the touch logic. As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex. We'll be considering rewriting this in the future in a proper language, or perhaps start relying on the new libcni feature I mentioned, after getting a clearer picture about its availability.

alpeb avatar May 27 '25 14:05 alpeb

As for the looping issue, I think we're gonna hold on further logical changes into this script, as doing concurrency handling in bash is getting a little too complex.

i personally think that this is a mistake and here's why:

handling the concurrency (whether in bash or in another language) is actually not very complex.

the inotifywait serializes all the filesystem events which makes them very easy to deal with... i.e. the bash code in question executes entirely sequentially.

the only thing you have to do is make sure that your logic can deal with filesystem events for multiple files in any order.

and i can trivially provoke an infinite patching loop (given the current shasum logic) by running the following two commands in quick succession:

echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "foo"}]}' > /etc/cni/net.d/01-foo.conflist
echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "bar"}]}' > /etc/cni/net.d/02-bar.conflist

...which i just did on one of my worker nodes and it sent linkerd-cni into said infinite patching loop:

[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
[2025-05-27 19:32:18] New/changed file [02-bar.conflist] detected; re-installing
[2025-05-27 19:32:18] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:18] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:18] Created CNI config /host/etc/cni/net.d/02-bar.conflist
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: MODIFY 02-bar.conflist
[2025-05-27 19:32:18] Ignoring event: MODIFY /host/etc/cni/net.d/02-bar.conflist; no real changes detected
[2025-05-27 19:32:18] Detected change in /host/etc/cni/net.d/: CREATE 01-foo.conflist
[2025-05-27 19:32:18] New/changed file [01-foo.conflist] detected; re-installing
[2025-05-27 19:32:19] Using CNI config template from CNI_NETWORK_CONFIG environment variable.
[2025-05-27 19:32:19] CNI config: {
  "name": "linkerd-cni",
  "type": "linkerd-cni",
  "log_level": "info",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-linkerd-cni-kubeconfig"
  },
  "linkerd": {
    "incoming-proxy-port": 4143,
    "outgoing-proxy-port": 4140,
    "proxy-uid": 2102,
    "ports-to-redirect": [],
    "inbound-ports-to-ignore": ["4191","4190","25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "outbound-ports-to-ignore": ["25","443","587","3306","4444","4567","4568","5432","6379","9300","11211"],
    "simulate": false,
    "use-wait-flag": false,
    "iptables-mode": "plain",
    "ipv6": false
  }
}
[2025-05-27 19:32:19] Created CNI config /host/etc/cni/net.d/01-foo.conflist
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: MODIFY 01-foo.conflist
[2025-05-27 19:32:19] Ignoring event: MODIFY /host/etc/cni/net.d/01-foo.conflist; no real changes detected
[2025-05-27 19:32:19] Detected change in /host/etc/cni/net.d/: CREATE 02-bar.conflist
...

...and it's perfectly plausible that two cni plugin files are created in short succession.

the race condition that is fixed by the current version of my pr may be "masking" the infinite patch loop race condition to a large extent.

so fixing this race condition will probably make the other race condition more likely to occur.

anyway, my patch to address the infinite patch loop is quite trivial imo. and i would personally include it in this pr.

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

sdickhoven avatar May 27 '25 19:05 sdickhoven

but... i don't want to talk you into doing something that you're not comfortable with. so... your call.

also... no rush. ☺️

we currently have the workaround outlined in the pr description deployed in all of our clusters. so we are fine. βœ…

if you'd rather rewrite the cni patch logic in golang or wait for the new subdirectory-based plugin config, that's fine by us.

but other linkerd-cni users may run into this race condition and won't be able to effectively troubleshoot / mitigate it. 🀷

sdickhoven avatar May 27 '25 20:05 sdickhoven

Thanks for the detailed repro for the infinite loop issue. I wasn't entirely clear on how multiple config files would coexist under /etc/cni/net.d (leaving aside the new libcni drop-in config stuff, as if IIUC the separate config files would be under separate directories, so it shouldn't affect our existing inotifywait setup). So here's a summary of what I tested out, just for reference.

I set up a test EKS cluster with aws-cni and cilium-cni. When installing cilium, I see it copies aws-cni's conflist file and appends its own config into its plugins array, naming the new file with a higher precedence so it overrides the original conflist:

root@ip-172-31-61-213:/host/etc/cni/net.d# ll
total 8
drwx------. 2 root root   55 May 28 14:44 ./
drwxr-xr-x. 3 root root   19 May 19 17:13 ../
-rw-r--r--. 1 root root 1034 May 28 14:44 05-cilium.conflist
-rw-r--r--. 1 root root  906 May 28 14:37 10-aws.conflist

The original conflist (10-aws.conflist) should no longer be modified, so I guess this is why we haven't encountered the condition you describe (btw this is probably a better approach than mutating files, but that's another discussion).

That being said, we have no guarantees about what touches those files and I think you made a good case about your infinite loop remediation which should provide improved guardrails.

As for the mtime resolution, that's a good find too :+1: We don't wanna open a hole for another edge case. Having to interact with the file system like that though still feels a little hacky to me. What do you think about calling a function instead? Such function would get reused inside monitor_cni_config(), whose only concern would end up being setting up inotifywait.

alpeb avatar May 28 '25 16:05 alpeb

btw this is probably a better approach than mutating files

it's not bad but it comes with its own set of challenges. πŸ™

but, yeah, instead of using shasum to keep track of which files have already been patched, we could just check if a newly created file in /etc/cni/net.d is the alphabetically "lowest" one in that folder (based on whatever criteria the kubernetes cni uses); and if not, simply ignore it.

Having to interact with the file system like that though still feels a little hacky to me. What do you think about calling a function instead? Such function would get reused inside monitor_cni_config(),

yes. we basically already have that: sync(). but monitor_cni_config() does implement some logic that should probably just go into sync() and then we could just call sync() from inside the find ... | while read file; do ... loop instead of triggering the inotifywait in monitor_cni_config() with a mv or a touch.

there are several things about this script that are a bit kludgy and inconsistent that i would be happy to clean up but i didn't want to make such a big change that you would be reluctant to accept my pr.

i.e. i wanted to make as small a change as possible that basically preserves most of the existing logic to make it easier for you to reason about my change.

but i've been writing shell scripts since the early 90s so i feel more than comfortable refactoring (at least some aspects of) this script if you are ok with that.

... whose only concern would end up being setting up inotifywait.

that wouldn't work without some exceptionally ugly kludges (because bash isn't a "real" programming language)... so, no to that.

...or maybe you meant to say "trigger" (instead of "setting up") the inotifywait 🀷

i.e. have a separate function that does

log "Trigger CNI config detection for $file"
tmp_file="$(mktemp -u /tmp/linkerd-cni.patch-candidate.XXXXXX)"
cp -fp "$file" "$tmp_file"
# The following will trigger the `sync()` function via `inotifywait` in
# `monitor_cni_config()`.
mv "$tmp_file" "$file"

sdickhoven avatar May 28 '25 18:05 sdickhoven

so... i found a number of additional race conditions in this script.

none of them have to do with bash but with the fact that we are dealing with a filesystem. so you have to really think about atomicity... which we would have to do in e.g. golang as well.

anyway... below is a diff based on the current pr of all the changes that i would like to submit in order to remove all (except for one very unlikely) race conditions (that i have discovered while refactoring)... plus a few formatting nits:

diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index fe2ffa0..f0ba4f5 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -25,7 +25,7 @@
 # - Expects the desired CNI config in the CNI_NETWORK_CONFIG env variable.
 
 # Ensure all variables are defined, and that the script fails when an error is hit.
-set -u -e -o pipefail
+set -u -e -o pipefail +o noclobber
 
 # Helper function for raising errors
 # Usage:
@@ -77,7 +77,7 @@ cleanup() {
   # Find all conflist files and print them out using a NULL separator instead of
   # writing each file in a new line. We will subsequently read each string and
   # attempt to rm linkerd config from it using jq helper.
-  local cni_data=''
+  local cni_data
   find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' \) -print0 |
     while read -r -d $'\0' file; do
       log "Removing linkerd-cni config from $file"
@@ -176,104 +176,113 @@ create_cni_conf() {
   CNI_NETWORK_CONFIG="${CNI_NETWORK_CONFIG:-}"
 
   # If the CNI Network Config has been overwritten, then use template from file
-  if [ -e "${CNI_NETWORK_CONFIG_FILE}" ]; then
-    log "Using CNI config template from ${CNI_NETWORK_CONFIG_FILE}."
-    cp "${CNI_NETWORK_CONFIG_FILE}" "${TMP_CONF}"
-  elif [ "${CNI_NETWORK_CONFIG}" ]; then
+  if [ -e "$CNI_NETWORK_CONFIG_FILE" ]; then
+    log "Using CNI config template from $CNI_NETWORK_CONFIG_FILE."
+    cp -fp "$CNI_NETWORK_CONFIG_FILE" "$TMP_CONF"
+  elif [ "$CNI_NETWORK_CONFIG" ]; then
     log 'Using CNI config template from CNI_NETWORK_CONFIG environment variable.'
-    cat >"${TMP_CONF}" <<EOF
-${CNI_NETWORK_CONFIG}
+    cat <<EOF > "$TMP_CONF"
+$CNI_NETWORK_CONFIG
 EOF
   fi
 
   # Use alternative command character "~", since these include a "/".
-  sed -i s~__KUBECONFIG_FILEPATH__~"${DEST_CNI_NET_DIR}/${KUBECONFIG_FILE_NAME}"~g ${TMP_CONF}
+  sed -i s~__KUBECONFIG_FILEPATH__~"$DEST_CNI_NET_DIR/$KUBECONFIG_FILE_NAME"~g "$TMP_CONF"
 
-  log "CNI config: $(cat ${TMP_CONF})"
+  log "CNI config: $(cat "$TMP_CONF")"
 }
 
 install_cni_conf() {
   local cni_conf_path=$1
 
-  local tmp_data=''
-  local conf_data=''
-  if [ -e "${cni_conf_path}" ]; then
-   # Add the linkerd-cni plugin to the existing list
-   tmp_data=$(cat "${TMP_CONF}")
-   conf_data=$(jq --argjson CNI_TMP_CONF_DATA "${tmp_data}" -f /linkerd/filter.jq "${cni_conf_path}")
-   echo "${conf_data}" > ${TMP_CONF}
-  fi
+  local tmp_data=$(cat "$TMP_CONF")
+  local conf_data=$(jq --argjson CNI_TMP_CONF_DATA "$tmp_data" -f /linkerd/filter.jq "$cni_conf_path" || true)
+
+  # Ensure that CNI config file did not disappear during processing.
+  [ -n "$conf_data" ] || return 0
 
-  # If the old config filename ends with .conf, rename it to .conflist, because it has changed to be a list
-  filename=${cni_conf_path##*/}
-  extension=${filename##*.}
+  # Add the linkerd-cni plugin to the existing list.
+  echo "$conf_data" > "$TMP_CONF"
+
+  # If the old config filename ends with .conf, rename it to .conflist because
+  # it has changed to be a list.
+  local filename=${cni_conf_path##*/}
+  local extension=${filename##*.}
   # When this variable has a file, we must delete it later.
   old_file_path=
-  if [ "${filename}" != '01-linkerd-cni.conf' ] && [ "${extension}" = 'conf' ]; then
-   old_file_path=${cni_conf_path}
-   log "Renaming ${cni_conf_path} extension to .conflist"
-   cni_conf_path="${cni_conf_path}list"
+  if [ "$filename" != '01-linkerd-cni.conf' ] && [ "$extension" = 'conf' ]; then
+    old_file_path=$cni_conf_path
+    log "Renaming $cni_conf_path extension to .conflist"
+    cni_conf_path="${cni_conf_path}list"
   fi
 
+  # Store SHA of each patched file in global `CNI_CONF_SHA` variable.
+  #
+  # This must happen in a non-concurrent access context!
+  #
+  # The below logic assumes that the `CNI_CONF_SHA` variable is already a
+  # valid JSON object. So this variable must be initialized with '{}'!
+  #
+  # E.g. (pretty-printed; actual variable stores compact JSON object)
+  #
+  # {
+  #   "/etc/cni/net.d/05-foo.conflist": "b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c",
+  #   "/etc/cni/net.d/10-bar.conflist": "7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730"
+  # }
+  local new_sha=$( (sha256sum "$TMP_CONF" || true) | awk '{print $1}' )
+  CNI_CONF_SHA=$(jq -c --arg f "$cni_conf_path" --arg sha "$new_sha" '. * {$f: $sha}' <<< "$CNI_CONF_SHA")
+
   # Move the temporary CNI config into place.
-  mv "${TMP_CONF}" "${cni_conf_path}" || exit_with_error 'Failed to mv files.'
-  [ -n "$old_file_path" ] && rm -f "${old_file_path}" && log "Removing unwanted .conf file"
+  mv "$TMP_CONF" "$cni_conf_path" || exit_with_error 'Failed to mv files.'
+  [ -n "$old_file_path" ] && rm -f "$old_file_path" && log "Removing unwanted .conf file"
 
-  log "Created CNI config ${cni_conf_path}"
+  log "Created CNI config $cni_conf_path"
 }
 
-# Sync() is responsible for reacting to file system changes. It is used in
-# conjunction with inotify events; sync() is called with the name of the file
-# that has changed, the event type (which can be either 'CREATE', 'DELETE',
-# 'MOVED_TO' or 'MODIFY', and the previously observed SHA of the configuration
-# file.
+# `sync()` is responsible for reacting to file system changes. It is used in
+# conjunction with inotify events; `sync()` is called with the event type (which
+# can be either 'CREATE', 'DELETE', 'MOVED_TO', 'MODIFY' or 'DELETE') and the
+# name of the file that has changed 
 #
-# Based on the changed file and event type, sync() might re-install the CNI
+# Based on the changed file and event type, `sync()` might re-install the CNI
 # plugin's configuration file.
 sync() {
-  local filename=$1
-  local ev=$2
-  local filepath="${HOST_CNI_NET}/$filename"
-
-  local prev_sha=$3
-
-  local config_file_count
-  local new_sha
-  if [ "$ev" = 'CREATE' ] || [ "$ev" = 'MOVED_TO' ] || [ "$ev" = 'MODIFY' ] || [ "$ev" = 'ATTRIB' ]; then
-    # When the event type is 'CREATE', 'MOVED_TO' or 'MODIFY', we check the
-    # previously observed SHA (updated with each file watch) and compare it
-    # against the new file's SHA. If they differ, it means something has
-    # changed.
-    new_sha=$(sha256sum "${filepath}" | while read -r s _; do echo "$s"; done)
-    if [ "$new_sha" != "$prev_sha" ]; then
-      # Create but don't rm old one since we don't know if this will be configured
-      # to run as _the_ cni plugin.
-      log "New/changed file [$filename] detected; re-installing"
-      create_kubeconfig
-      create_cni_conf
-      install_cni_conf "$filepath"
-    else
-      # If the SHA hasn't changed or we get an unrecognised event, ignore it.
-      # When the SHA is the same, we can get into infinite loops whereby a file has
-      # been created and after re-install the watch keeps triggering CREATE events
-      # that never end.
-      log "Ignoring event: $ev $filepath; no real changes detected"
-    fi
+  local ev=$1
+  local file=${2//\/\//\/} # replace "//" with "/"
+
+  [[ "$file" =~ .*.(conflist|conf)$ ]] || return 0
+
+  log "Detected event: $ev $file"
+
+  # Retrieve previous SHA of detected file (if any) and compute current SHA.
+  local previous_sha=$(jq -r --arg f "$file" '.[$f] | select(.)' <<< "$CNI_CONF_SHA")
+  local current_sha=$( (sha256sum "$file" || true) | awk '{print $1}' )
+
+  # If the SHA hasn't changed or the detected file has disappeared, ignore it.
+  # When the SHA is the same, we can get into infinite loops whereby a file
+  # has been created and after re-install the watch keeps triggering MOVED_TO
+  # events that never end.
+  # If the `current_sha` variable is blank then the detected CNI config file has
+  # disappeared and no further action is required.
+  # There exists an unhandled (highly improbable) edge case where a CNI plugin
+  # creates a config file and then _immediately_ removes it again _while_ we are
+  # in the process of patching it. If this happens, we may create a patched CNI
+  # config file that should *not* exist.
+  if [ -n "$current_sha" ] && [ "$current_sha" != "$previous_sha" ]; then
+    log "New/changed file [$file] detected; re-installing"
+    create_kubeconfig
+    create_cni_conf
+    install_cni_conf "$file"
+  else
+    log "Ignoring event: $ev $file; no real changes detected or file disappeared"
   fi
 }
 
 # monitor_cni_config starts a watch on the host's CNI config directory
 monitor_cni_config() {
-  inotifywait -m "${HOST_CNI_NET}" -e create,moved_to,modify,attrib |
+  inotifywait -m "$HOST_CNI_NET" -e create,moved_to,modify |
     while read -r directory action filename; do
-      if [[ "$filename" =~ .*.(conflist|conf)$ ]]; then 
-        log "Detected change in $directory: $action $filename"
-        sync "$filename" "$action" "$cni_conf_sha"
-        # calculate file SHA to use in the next iteration
-        if [[ -e "$directory/$filename" ]]; then
-          cni_conf_sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
-        fi
-      fi
+      sync "$action" "$directory/$filename"
     done
 }
 
@@ -284,16 +293,16 @@ monitor_cni_config() {
 # only reacting to direct creation of a "token" file, or creation of
 # directories containing a "token" file.
 monitor_service_account_token() {
-    inotifywait -m "${SERVICEACCOUNT_PATH}" -e create |
-      while read -r directory _ filename; do
-        target=$(realpath "$directory/$filename")
-        if [[ (-f "$target" && "${target##*/}" == "token") || (-d "$target" && -e "$target/token") ]]; then
-          log "Detected creation of file in $directory: $filename; recreating kubeconfig file"
-          create_kubeconfig
-        else
-          log "Detected creation of file in $directory: $filename; ignoring"
-        fi
-      done
+  inotifywait -m "$SERVICEACCOUNT_PATH" -e create |
+    while read -r directory _ filename; do
+      target=$(realpath "$directory/$filename")
+      if [[ (-f "$target" && "${target##*/}" == "token") || (-d "$target" && -e "$target/token") ]]; then
+        log "Detected creation of file in $directory: $filename; recreating kubeconfig file"
+        create_kubeconfig
+      else
+        log "Detected creation of file in $directory: $filename; ignoring"
+      fi
+    done
 }
 
 log() {
@@ -306,35 +315,32 @@ log() {
 
 # Delete old "interface mode" file, possibly left over from previous versions
 # TODO(alpeb): remove this on stable-2.15
-rm -f "${DEFAULT_CNI_CONF_PATH}"
+rm -f "$DEFAULT_CNI_CONF_PATH"
 
 install_cni_bin
 
-# The CNI config monitor must be set up _before_ we start patching CNI config
-# files!
+# The CNI config monitor must be set up _before_ we start patching existing CNI
+# config files!
 # Otherwise, new CNI config files can be created just _after_ the initial round
 # of patching and just _before_ we set up the `inotifywait` loop to detect new
 # CNI config files.
-cni_conf_sha="__init__"
+CNI_CONF_SHA='{}'
 monitor_cni_config &
 
 # Append our config to any existing config file (*.conflist or *.conf)
-config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \))
+config_files=$(find "$HOST_CNI_NET" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) | grep -v linkerd || true)
 if [ -z "$config_files" ]; then
-    log "No active CNI configuration files found"
+  log "No active CNI configuration files found"
 else
-  config_file_count=$(echo "$config_files" | grep -v linkerd | sort | wc -l)
-  if [ "$config_file_count" -eq 0 ]; then
-    log "No active CNI configuration files found"
-  else
-    find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
-      while read -r -d $'\0' file; do
-        log "Trigger CNI config detection for $file"
-        # The following will trigger the `sync()` function via `inotifywait` in
-        # `monitor_cni_config()`.
-        touch "$file"
-      done
-  fi
+  find "${HOST_CNI_NET}" -maxdepth 1 -type f \( -iname '*conflist' -o -iname '*conf' \) -print0 |
+    while read -r -d $'\0' file; do
+      log "Trigger CNI config detection for $file"
+      tmp_file="$(mktemp -u /tmp/linkerd-cni.patch-candidate.XXXXXX)"
+      cp -fp "$file" "$tmp_file"
+      # The following will trigger the `sync()` function via filesystem event.
+      # This requires `monitor_cni_config()` to be up and running!
+      mv "$tmp_file" "$file" || exit_with_error 'Failed to mv files.'
+    done
 fi
 
 # Watch in bg so we can receive interrupt signals through 'trap'. From 'man

anyway, as you can see, it's quite a comprehensive change.

[...] then we could just call sync() from inside the find ... | while read file; do ... loop instead of triggering the inotifywait in monitor_cni_config() with a mv or a touch.

...actually, ☝️ this would make the bash code execute concurrently and cause lots of problems with global variables. so we definitely shouldn't do this.

we want to serialize all CNI config file changes in this script. and that would NOT be the case if we put monitor_cni_config() into the background (with &) before the find ... | while read file; do ... loop and then call sync() directly from the loop.

sdickhoven avatar May 29 '25 00:05 sdickhoven

i found a number of additional race conditions in this script.

e.g. this is a race condition.

if the cni config file is removed between

if [ -e "${cni_conf_path}" ]; then

and

conf_data=$(jq --argjson CNI_TMP_CONF_DATA "${tmp_data}" -f /linkerd/filter.jq "${cni_conf_path}")

the script will exit and linkerd-cni's ability to patch cni config files will be permanently disabled.

as you can see, i am doing the following instead which is robust against this edge case:

conf_data=$(jq --argjson CNI_TMP_CONF_DATA "$tmp_data" -f /linkerd/filter.jq "$cni_conf_path" || true)
[ -n "$conf_data" ] || return 0

sdickhoven avatar May 29 '25 00:05 sdickhoven

anyway, most people are familiar with using locks/mutexes when dealing with concurrent access to mutable data... but if you cannot do that (as is the case here), most engineers will probably struggle to write "correct" code... because you have to use atomic operations in that context which is probably a less familiar concept to most.

so libcni's recently released mechanism for subdirectory-based plugin config loading definitely addresses a real challenge.

sdickhoven avatar May 29 '25 11:05 sdickhoven

i just did a bit of testing and i found even more race conditions. i'll update the above diff once i'm happy with what i have.

sdickhoven avatar May 29 '25 21:05 sdickhoven

ok. i have pushed all of the changes that i strongly believe should be made to eliminate all of the race conditions that i have discovered... except for one highly improbably one (that i have documented) that i was unable to address.

i CANNOT make the updated script break even when i torture it with an absolute avalanche of filesystem events.

e.g.

(
  while :
  do
    echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "foo"}]}' > tmp/01-foo.conflist
    echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "bar"}]}' > tmp/02-bar.conflist
    echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "baz"}]}' > tmp/03-baz.conflist
    echo '{"cniVersion": "0.4.0", "name": "aws-cni", "disableCheck": true, "plugins": [{"name": "bam"}]}' > tmp/04-bam.conflist
    mv tmp/01-foo.conflist etc/cni/net.d/01-foo.conf
    mv tmp/02-bar.conflist etc/cni/net.d/02-bar.conflist
    mv tmp/03-baz.conflist etc/cni/net.d/03-baz.conflist
    mv tmp/04-bam.conflist etc/cni/net.d/04-bam.conflist
    rm etc/cni/net.d/04-bam.conflist
  done
) &

sleep 5
kill $!

(you have to give the script a few seconds to finish processing all the queued up filesystem events... but it will finish because the infinite patching loop race condition - as well as all of the race conditions that will crash the existing script - are gone!)

if you do the above with the existing script, it crashes VERY QUICKLY!

the test environment for my tests was:

docker run -ti --rm \
  -v $(pwd)/serviceaccount:/var/run/secrets/kubernetes.io/serviceaccount \
  -v $(pwd)/etc/cni/net.d:/host/etc/cni/net.d \
  -v $(pwd)/opt/cni/bin:/host/opt/cni/bin \
  -v $(pwd)/tmp:/tmp \
  --mount type=bind,src=$HOME/git/contrib/linkerd2-proxy-init/cni-plugin/deployment/scripts/install-cni.sh,dst=/linkerd/install-cni.sh,ro \
  -e KUBERNETES_SERVICE_HOST=::1 \
  -e KUBERNETES_SERVICE_PORT=443 \
  -e SKIP_TLS_VERIFY=true \
  -e CNI_NETWORK_CONFIG='{"type":"linkerd-cni","kubernetes":{"kubeconfig":"__KUBECONFIG_FILEPATH__"}}' \
  -e SLEEP=true \
  cr.l5d.io/linkerd/cni-plugin:v1.6.2

sdickhoven avatar May 30 '25 13:05 sdickhoven

Hey @sdickhoven , I'm back at looking into this. Do you mind moving all the changes not strictly related to the matter at hand (race conditions) into a separate PR?

alpeb avatar Jun 09 '25 20:06 alpeb

Hey @sdickhoven , I'm back at looking into this. Do you mind moving all the changes not strictly related to the matter at hand (race conditions) into a separate PR?

done βœ…

i put my non-functional changes into https://github.com/linkerd/linkerd2-proxy-init/pull/528

sdickhoven avatar Jun 09 '25 21:06 sdickhoven

Actually you might be forced to force-push to fix the DCO :wink:

alpeb avatar Jun 10 '25 23:06 alpeb

Can you please address that, and merge main into your branch? I was testing in EKS and hit some issues because this doesn't contain the recent cleanup logic fix.

πŸ‘ will do.

And please don't force push so it's easier for me to review πŸ™

of course. sorry about that. 😊

that force-push was for a minor change to a comment. nothing functional. but i realize that you can't know that. all you see is a force-push. i will refrain from force-pushing.

sdickhoven avatar Jun 11 '25 00:06 sdickhoven

sorry i just saw

Commit sha: 812d8b5, Author: Simon Dickhoven, Committer: Simon Dickhoven; The sign-off is missing. Commit sha: defda27, Author: Simon Dickhoven, Committer: Simon Dickhoven; The sign-off is missing. Commit sha: f6d3844, Author: Simon Dickhoven, Committer: Simon Dickhoven; The sign-off is missing. Commit sha: 4112de0, Author: Simon Dickhoven, Committer: Simon Dickhoven; The sign-off is missing. Commit sha: 335bb3b, Author: Simon Dickhoven, Committer: Simon Dickhoven; The sign-off is missing.

😊 do you want me to go back and fix those commits? i've never messed with git history but i'm sure i can figure it out.

sdickhoven avatar Jun 11 '25 01:06 sdickhoven

😊 do you want me to go back and fix those commits? i've never messed with git history but i'm sure i can figure it out.

Yes please, you'll have to do an interactive rebase I think. So the force-commit will be required after all.

alpeb avatar Jun 11 '25 12:06 alpeb

ok. that seems to have worked. 😌

now that i know how to edit past commits, would you like me to also prepend my commit messages with e.g.

fix(linkerd-cni): ...

?

...though that would require another force-push.

sdickhoven avatar Jun 11 '25 13:06 sdickhoven

hi @alpeb πŸ‘‹

any idea when you are going to create a new release / docker image for the cni plugin with the updated install-cni.sh?

sdickhoven avatar Jun 29 '25 12:06 sdickhoven

hi @alpeb πŸ‘‹

any idea when you are going to create a new release / docker image for the cni plugin with the updated install-cni.sh?

hey @sdickhoven just a heads up that @alpeb is out of the office this week, but we can likely create a new release once he is back.

cratelyn avatar Jul 02 '25 16:07 cratelyn