tetragon icon indicating copy to clipboard operation
tetragon copied to clipboard

WIP: IMA hashes in LSM events

Open anfedotoff opened this issue 1 year ago • 1 comments

Adding support for collecting file hashes calculated by Linux integrity subsystem (IMA). IMA hashes will be contained in LSM events. To make it work you need to:

There are limitations on implementation due to the requirement to use bpf_ima_file_hash/bpf_ima_inode_hash helpers. The more information you can get from #2409.

Some design insights:

  • File hash collection will be triggered by Post Action
  • Only limited set of lsm hooks is supported: (mmap_file, bprm_check_security, file_open)
  • Hash collection is started after LSM even is posted (tail call from generic_output). Hashes are stored in separate bpf_map. Event is enriched in user spaces the way stacktrace extraction works.

TODO list:

  • [x] modification of bpf_generic_lsm and do_actions
  • [x] CRD and lsm sensor updates
  • [ ] tests
  • [ ] docs

anfedotoff avatar Aug 16 '24 13:08 anfedotoff

Deploy Preview for tetragon ready!

Name Link
Latest commit d5c5ec2e9e4d12ec8839610aa1b85cde95cd39a8
Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66feb22fa7347900086e433f
Deploy Preview https://deploy-preview-2818--tetragon.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

netlify[bot] avatar Aug 16 '24 15:08 netlify[bot]

@kkourt , hi! May I ask you for a code review? Next Monday, on Tetragon community meeting I'll show a demo. I want to talk about some design decisions I made. Also I'm going to speak about problems that I overcame and those are still exists.

anfedotoff avatar Sep 04 '24 08:09 anfedotoff

@kkourt , hi! May I ask you for a code review? Next Monday, on Tetragon community meeting I'll show a demo. I want to talk about some design decisions I made. Also I'm going to speak about problems that I overcame and those are still exists.

Hello :)

Thanks for the heads up! Will try and have a look before the community meeting. Looking forward to the demo!

kkourt avatar Sep 04 '24 12:09 kkourt

I refreshed on LSM programs and I think we can do this differently

so LSM programs are attached through trampolines on specific bpf_lsm_XXX function which is invoked as part of the lsm modules call chain on particular lsm call back

and trampoline allows to attach and invoke multiple bpf programs to specific function (or lsm hook)

so I think we could have the sleepable lsm hooks invoked first to get the hash and update the map and then we will have generic lsm invoked that would run the hash action, which would get the hash from the map and send the event

  bpf_lsm_XXX
    -> trampoline
         run lsm.s/file_open
               map_update_elem(&ima_hash_map, &e->current, &hash, BPF_ANY);
         run lsm/generic_lsm
               map_lookup_elem(&ima_hash_map, &e->current, ...)

to ensure the generic lsm is invoked AFTER lsm.s/XXX hook we can rely on kernel using standard lists when queuing programs for trampoline, so you should just attach lsm.s hook first and then generic lsm

I guess there might be some hiccups when you'll implement that (there always are) but so far it seems straightforward

WDYT?

olsajiri avatar Sep 10 '24 08:09 olsajiri

I refreshed on LSM programs and I think we can do this differently

so LSM programs are attached through trampolines on specific bpf_lsm_XXX function which is invoked as part of the lsm modules call chain on particular lsm call back

and trampoline allows to attach and invoke multiple bpf programs to specific function (or lsm hook)

so I think we could have the sleepable lsm hooks invoked first to get the hash and update the map and then we will have generic lsm invoked that would run the hash action, which would get the hash from the map and send the event

  bpf_lsm_XXX
    -> trampoline
         run lsm.s/file_open
               map_update_elem(&ima_hash_map, &e->current, &hash, BPF_ANY);
         run lsm/generic_lsm
               map_lookup_elem(&ima_hash_map, &e->current, ...)

to ensure the generic lsm is invoked AFTER lsm.s/XXX hook we can rely on kernel using standard lists when queuing programs for trampoline, so you should just attach lsm.s hook first and then generic lsm

I guess there might be some hiccups when you'll implement that (there always are) but so far it seems straightforward

WDYT?

Thanks for making it clear! IIUC, we will calculate hash on every triggered hook. It might cause some performance issues. Now our calls of bpf_ima_file_hash are filtered by tracing policy. When we move lsm.s/ima_* programe to the top, we will call helper no matter if tracing policy is satisfied or not. I think, it will be OK, if ima_policy is satisfied. Let's consider the following example:

We have a default ima_policy=tcb:

measure func=MMAP_CHECK mask=MAY_EXEC
measure func=BPRM_CHECK mask=MAY_EXEC # binary executed
measure func=FILE_CHECK mask=^MAY_READ euid=0
measure func=FILE_CHECK mask=^MAY_READ uid=0 # root opened r/o, r/w
measure func=MODULE_CHECK
measure func=FIRMWARE_CHECK
measure func=POLICY_CHECK

In particular, this ima_policy will measure hashes when files owned by root are opened for reading. So, in solution that you proposed will call bpf_ima_file_hash on every hook. The question is what helper will return, if ima_policy is not satisfied? If error, than it is OK, I think. But if it calculates hash despite the ima_policy it is not OK. I think, I can check this.

To sum up, I think, you solution is a nice compromise if ima_policy is taking to account during bpf_ima_file_hash call. What do you think about this?

anfedotoff avatar Sep 10 '24 15:09 anfedotoff

When we move lsm.s/ima_* programe to the top, we will call helper no matter if tracing policy is satisfied or not.

ah ok, I did not realized you might want to filter :-\

could we split the lsm sensor into filtering program, then run the program with ima helper and then run a program that sends it out? bit of a stretch but trampolines seem to disable cpu migration so we could use the percpu heap map to store half-done event and send it out in the last program

IIUC trampolines with LSM programs will run registered lsm programs until one of them returns != 0 .. but will need to check on that

also I still need to check all the options for lsm sleepable programs

olsajiri avatar Sep 11 '24 08:09 olsajiri

When we move lsm.s/ima_* programe to the top, we will call helper no matter if tracing policy is satisfied or not. ah ok, I did not realized you might want to filter :-\

could we split the lsm sensor into filtering program, then run the program with ima helper and then run a program that sends it out? bit of a stretch but trampolines seem to disable cpu migration so we could use the percpu heap map to store half-done event and send it out in the last program

IIUC trampolines with LSM programs will run registered lsm programs until one of them returns != 0 .. but will need to check on that

also I still need to check all the options for lsm sleepable programs

Hmm, maybe we can split sensor into three parts. If we can use the same percpu heap map this scheme might work. It seems to me, that your first solution is much easier to implement:). I tried to get hashes for files that are out of scope ima_policy=tcb. Helper returned --EOPNOTSUPP. I think, that exactly what we want! Hash calculation is blocked. I'm looking at the __ima_inode_hash to make sure that hash calculation is block due to policy violation, but didn't succeed yet.

anfedotoff avatar Sep 11 '24 10:09 anfedotoff

I found out that bpf_ima_file_hash must get hash despite ima policy. There is another reason that I can't get hash for a file that doesn't satisfy ima policy. Maybe I'll find time to debug helper to know the reason. But to put ima helper to the top it is not good... Maybe we should split it three part lsm sensor.

anfedotoff avatar Sep 11 '24 15:09 anfedotoff

Hmm, maybe we can split sensor into three parts. If we can use the same percpu heap map this scheme might work. It seems to me, that your first solution is much easier to implement:).

yea, it might be better to do the initial solution first as a base and split to 3 parts lsm sensor as a follow up.. we might learn other things that could shape up the final solution when doing that ;-)

olsajiri avatar Sep 11 '24 16:09 olsajiri

Hmm, maybe we can split sensor into three parts. If we can use the same percpu heap map this scheme might work. It seems to me, that your first solution is much easier to implement:).

yea, it might be better to do the initial solution first as a base and split to 3 parts lsm sensor as a follow up.. we might learn other things that could shape up the final solution when doing that ;-)

I thought, I can implement some example of our idea using ebpf-go, before coding in tetragon. Here is the repo: https://github.com/anfedotoff/ima-ebpfgo-example, please have a look.

I have 3 bpf programs: three parts of LSM sensor. In the first program I emulate filtering and enforcing. Second program is used to get hash and the last one to send event via perf buffer. What I noticed.

  • we need to save information is event filtered or not in test_msg struct (msg_generic_kprobe in case of tetragon). Don't forget about NoPost Action
  • enforcing, filtering works fine.
  • it seems to me, that send can be earlier the hash calculation. (race condition now resides here).

Also, I'm not sure, that it is enough just to attach programs this way to preserve the order when hook is triggered.

So far it seems that approach is not working for us, but I may be wrong.

anfedotoff avatar Sep 12 '24 16:09 anfedotoff

So far it seems that approach is not working for us, but I may be wrong.

nice, but I'm not sure I understand what's wrong, could you please elaborate? I'll try it anyway

olsajiri avatar Sep 15 '24 21:09 olsajiri

So far it seems that approach is not working for us, but I may be wrong.

nice, but I'm not sure I understand what's wrong, could you please elaborate? I'll try it anyway

Yes, sure, I'll try to explain. I attach three bpf programs using standard Atach interface for LSM programs in the following order:

  1. init_bprm_check (create event, store filename of executed binary in the map, do filtering + blocking);
  2. ima_bprm_check (lsm.s program, that calculates hashes and store it in the map);
  3. send_bprm_check (send stored event in map using perfbuffer);

The question is can we rely on the order of execution when bprm_check_security is triggered?

Here is some logs:

024/09/16 10:01:22 Waiting for events..
2024/09/16 10:01:44 Error file: ./tetragon. Code:0
2024/09/16 10:01:51 Error file: ./tetragon. Code:0
2024/09/16 10:01:57 Error file: ./tetragon. Code:0
2024/09/16 10:01:57 File: ./tetragon
2024/09/16 10:01:57 SHA256: 9c20c4a712cd95766b64bcddc1094b619092944b592be43d75c3f4b3fb1434d40000000000000000000000000000000000000000000000000000000000000000
2024/09/16 10:01:57 <=============>
2024/09/16 10:01:58 File: ./tetragon
2024/09/16 10:01:58 SHA256: 9c20c4a712cd95766b64bcddc1094b619092944b592be43d75c3f4b3fb1434d40000000000000000000000000000000000000000000000000000000000000000
2024/09/16 10:01:58 <=============>
2024/09/16 10:01:59 File: ./tetragon
2024/09/16 10:01:59 SHA256: 29aa689f38158d2e8941fa54e436f0260890af31cecad1e8799e5c2df7bc1ecc0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 10:01:59 <=============>
2024/09/16 10:02:20 Error file: ./tetragon. Code:0
2024/09/16 10:02:22 File: ./tetragon
2024/09/16 10:02:22 SHA256: 4f291296e89b784cd35479fca606f228126e3641f5bcaee68dee36583d7c94830000000000000000000000000000000000000000000000000000000000000000
2024/09/16 10:02:22 <=============>
^C2024/09/16 10:02:31 Received signal, exiting..

According this logs we see, that the 1st program always starts and finishes first. Some times the third program starts before second is finished. 2024/09/16 10:02:20 Error file: ./tetragon. Code:0 this line means that there is no hash in event. It seems to me this because sleepable program sleeps and another bpf program is being executed (this is a send program in our case).

anfedotoff avatar Sep 16 '24 07:09 anfedotoff

The question is can we rely on the order of execution when bprm_check_security is triggered?

I think we can, it's hardcoded in kernel.. and if it changes we could actually detect the way it's added and use proper order attach

however it's the other way round (check https://pastebin.com/Y8EGQ0fg for the trampoline dump), so when I reverse attach in your example I get proper order of executed programs

could you plz try the change below?

thanks, jirka

diff --git a/main.go b/main.go
index 1d1fa138660d..0be26120a55f 100644
--- a/main.go
+++ b/main.go
@@ -40,11 +40,11 @@ func main() {
        }
        defer objs.Close()
 
-       initBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.InitBprmCheck})
+       sendBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.SendBprmCheck})
        if err != nil {
                log.Fatalf("linking LSM failed: %s", err)
        }
-       defer initBprmHook.Close()
+       defer sendBprmHook.Close()
 
        imaBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.ImaBprmCheck})
        if err != nil {
@@ -52,11 +52,11 @@ func main() {
        }
        defer imaBprmHook.Close()
 
-       sendBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.SendBprmCheck})
+       initBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.InitBprmCheck})
        if err != nil {
                log.Fatalf("linking LSM failed: %s", err)
        }
-       defer sendBprmHook.Close()
+       defer initBprmHook.Close()
 
        rd, err := perf.NewReader(objs.TcpmonMap, 65535)
        if err != nil {

olsajiri avatar Sep 16 '24 11:09 olsajiri

The question is can we rely on the order of execution when bprm_check_security is triggered?

I think we can, it's hardcoded in kernel.. and if it changes we could actually detect the way it's added and use proper order attach

however it's the other way round (check https://pastebin.com/Y8EGQ0fg for the trampoline dump), so when I reverse attach in your example I get proper order of executed programs

could you plz try the change below?

thanks, jirka

diff --git a/main.go b/main.go
index 1d1fa138660d..0be26120a55f 100644
--- a/main.go
+++ b/main.go
@@ -40,11 +40,11 @@ func main() {
        }
        defer objs.Close()
 
-       initBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.InitBprmCheck})
+       sendBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.SendBprmCheck})
        if err != nil {
                log.Fatalf("linking LSM failed: %s", err)
        }
-       defer initBprmHook.Close()
+       defer sendBprmHook.Close()
 
        imaBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.ImaBprmCheck})
        if err != nil {
@@ -52,11 +52,11 @@ func main() {
        }
        defer imaBprmHook.Close()
 
-       sendBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.SendBprmCheck})
+       initBprmHook, err := link.AttachLSM(link.LSMOptions{Program: objs.imaPrograms.InitBprmCheck})
        if err != nil {
                log.Fatalf("linking LSM failed: %s", err)
        }
-       defer sendBprmHook.Close()
+       defer initBprmHook.Close()
 
        rd, err := perf.NewReader(objs.TcpmonMap, 65535)
        if err != nil {

Hmm, I changed the order (swap init and send programs) as you suggest. I got no events collected, but execution is blocked. Are you sure that the order is really proper? It seems to me, that send program is now executed before init program. (send -> ima -> init). Anyway, the good news is that order is preserved. But other question is that sometimes we miss the hash in event. Does it mean that send program is started before ima program is finished?

anfedotoff avatar Sep 16 '24 11:09 anfedotoff

also I think we need to use hash for the data shared by the program, it looks like cpu migration is not actually disabled, so we can't use per-cpu heap map in here

olsajiri avatar Sep 16 '24 11:09 olsajiri

hum, when I remove the 'filter' from init function, I get all executed files output: (I'm still using the per-cpu heap, but I don't think we should rely on that)

root@amd:/home/jolsa/ima-ebpfgo-example# ./ima-test 
2024/09/16 12:04:42 Waiting for events..
2024/09/16 12:04:43 File: /usr/bin/ls
2024/09/16 12:04:43 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:43 <=============>
2024/09/16 12:04:45 File: /usr/bin/ls
2024/09/16 12:04:45 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:45 <=============>
2024/09/16 12:04:47 File: /usr/bin/ls
2024/09/16 12:04:47 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:47 <=============>
2024/09/16 12:04:49 File: /usr/bin/ls
2024/09/16 12:04:49 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:49 <=============>
2024/09/16 12:04:50 File: /usr/bin/w
2024/09/16 12:04:50 SHA256: ae49096b42cebab2f3a2557e32b0f3e3e5139fd7e22201c9dc5bcfab9701946c0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:50 <=============>
2024/09/16 12:05:33 File: /usr/bin/git
2024/09/16 12:05:33 SHA256: 29aa689f38158d2e8941fa54e436f0260890af31cecad1e8799e5c2df7bc1ecc0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:33 <=============>
2024/09/16 12:05:33 File: /usr/bin/pager
2024/09/16 12:05:33 SHA256: f95851f468afe5c6e287d741153bde008ef26844916f5dfcb916e849ef3ae8bb0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:33 <=============>
2024/09/16 12:05:47 File: /usr/bin/vim
2024/09/16 12:05:47 SHA256: dfc4b48f9b7976f0b64f6e99702ad8b1e4fb1929958fe07da98dfeb88f0a23fb0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:47 <=============>

olsajiri avatar Sep 16 '24 12:09 olsajiri

hum, when I remove the 'filter' from init function, I get all executed files output: (I'm still using the per-cpu heap, but I don't think we should rely on that)

root@amd:/home/jolsa/ima-ebpfgo-example# ./ima-test 
2024/09/16 12:04:42 Waiting for events..
2024/09/16 12:04:43 File: /usr/bin/ls
2024/09/16 12:04:43 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:43 <=============>
2024/09/16 12:04:45 File: /usr/bin/ls
2024/09/16 12:04:45 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:45 <=============>
2024/09/16 12:04:47 File: /usr/bin/ls
2024/09/16 12:04:47 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:47 <=============>
2024/09/16 12:04:49 File: /usr/bin/ls
2024/09/16 12:04:49 SHA256: 12a6d908a68ccf6f9f3d799705577c28763f5deef6eddcff7643d6d8a6de543d0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:49 <=============>
2024/09/16 12:04:50 File: /usr/bin/w
2024/09/16 12:04:50 SHA256: ae49096b42cebab2f3a2557e32b0f3e3e5139fd7e22201c9dc5bcfab9701946c0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:04:50 <=============>
2024/09/16 12:05:33 File: /usr/bin/git
2024/09/16 12:05:33 SHA256: 29aa689f38158d2e8941fa54e436f0260890af31cecad1e8799e5c2df7bc1ecc0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:33 <=============>
2024/09/16 12:05:33 File: /usr/bin/pager
2024/09/16 12:05:33 SHA256: f95851f468afe5c6e287d741153bde008ef26844916f5dfcb916e849ef3ae8bb0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:33 <=============>
2024/09/16 12:05:47 File: /usr/bin/vim
2024/09/16 12:05:47 SHA256: dfc4b48f9b7976f0b64f6e99702ad8b1e4fb1929958fe07da98dfeb88f0a23fb0000000000000000000000000000000000000000000000000000000000000000
2024/09/16 12:05:47 <=============>

Yes, that works for me too, no hashes are missed. I remove filter with out changing order of execution, some hashes were missed.

Now I'm looking at __bpf_prog_enter_sleepable and __bpf_prog_enter

I think we need to use hash for the data shared by the program, it looks like cpu migration is not actually disabled, so we can't use per-cpu heap map in here

It looks so... To use hash map is a good idea, but for now I don't know how to share key for this map. If we can't use per-cpu heap.

anfedotoff avatar Sep 16 '24 12:09 anfedotoff

It looks so... To use hash map is a good idea, but for now I don't know how to share key for this map. If we can't use per-cpu heap.

I think just bpf_get_current_pid_tgid() should be ok? it's always under same task (but maybe not same cpu)

olsajiri avatar Sep 16 '24 12:09 olsajiri

It looks so... To use hash map is a good idea, but for now I don't know how to share key for this map. If we can't use per-cpu heap.

I think just bpf_get_current_pid_tgid() should be ok? it's always under same task (but maybe not same cpu)

This Is the first thing that came up to my mind. But what if we have file_open hook and the same process opens several files will it be OK? I think I can try to check this, nevertheless.

anfedotoff avatar Sep 16 '24 13:09 anfedotoff

This Is the first thing that came up to my mind. But what if we have file_open hook and the same process opens several files will it be OK? I think I can try to check this, nevertheless.

hum, so if the key is thread id then single thread could have just one active file_open call and we should be fine, right?

olsajiri avatar Sep 16 '24 13:09 olsajiri

This Is the first thing that came up to my mind. But what if we have file_open hook and the same process opens several files will it be OK? I think I can try to check this, nevertheless.

hum, so if the key is thread id then single thread could have just one active file_open call and we should be fine, right?

Ahh, yes, we have tid. You are right! It seems to be fine than. I'll try to fix my example. If every thing is fine, I'll start to refactor code in th PR. Thank you for helping me to find the solution!

anfedotoff avatar Sep 16 '24 13:09 anfedotoff

This Is the first thing that came up to my mind. But what if we have file_open hook and the same process opens several files will it be OK? I think I can try to check this, nevertheless.

hum, so if the key is thread id then single thread could have just one active file_open call and we should be fine, right?

Ahh, yes, we have tid. You are right! It seems to be fine than. I'll try to fix my example. If every thing is fine, I'll start to refactor code in th PR. Thank you for helping me to find the solution!

Everything is fine! I can start refactoring :fireworks:

anfedotoff avatar Sep 16 '24 14:09 anfedotoff

It works! No race condition! :sparkles:

anfedotoff avatar Sep 19 '24 19:09 anfedotoff

Thanks, this looks good to me overall!

I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful!

Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

anfedotoff avatar Sep 30 '24 10:09 anfedotoff

Thanks, this looks good to me overall! I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful! Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

Great Thanks! Can you please split the change that changes the return value into a separate commit with a short explanation of why it OK to do as discussed in another comment? Other than that, LGTM!

kkourt avatar Oct 03 '24 11:10 kkourt

Thanks, this looks good to me overall! I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful! Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

Great Thanks! Can you please split the change that changes the return value into a separate commit with a short explanation of why it OK to do as discussed in another comment? Other than that, LGTM!

@kkourt, can you look one more time plz)?

anfedotoff avatar Oct 03 '24 15:10 anfedotoff

All tests ✅, merging! Thanks!

kkourt avatar Oct 04 '24 07:10 kkourt