bkt icon indicating copy to clipboard operation
bkt copied to clipboard

Daemonize --stale refreshes

Open rraval opened this issue 9 months ago • 6 comments

Hi there, I'm not a 100% sure if the problem is on the bkt side or somewhere else.

My use case is as follows:

  • direnv loads the .envrc for my project
  • .envrc invokes a bunch of bkt processes with bkt --ttl=90d --stale 1h --discard-failures -- <very slow command>
  • Every hour, loading the environment variables gets stuck with a message like:
direnv: ([<.envrc path>]) is taking a while to execute. Use CTRL-C to give up.

I've verified that the direct bkt invocations return quickly by adding set -x to my .envrc.

If I'm understanding https://github.com/direnv/direnv/issues/626#issuecomment-619574344 correctly, the bash process that direnv is launching is still waiting for all sub-sub-processes to exit, which somehow includes the child process that bkt launches to refresh the --stale cache.

Would it be possible to truly daemonize the refresh, allowing direnv to run quickly?

rraval avatar Mar 23 '25 14:03 rraval

Thanks for the report! Yes we should be able to get --stale working with this tool. You can see the current implementation here, do you happen to know what's needed to make direnv happy? Alternatively, are you able to share an MCVE that reproduces the issue (with or without direnv specifically). No worries if not, it would just help me investigate.

dimo414 avatar Mar 26 '25 06:03 dimo414

I'm far from an expert, but my vague understanding is that a double-fork is necessary.

  • The parent bkt invocation needs to fork a child, which needs to immediately fork a grandchild.
  • The grandchild can invoke the logic in force_update_async.
  • The child should exit immediately, which causes the grandchild to be re-parented to the top level init process.

My googling found this discourse thread that links to the daemonize crate that might handle this for you.

I will put together a MCVE when I get a moment.

rraval avatar Mar 26 '25 17:03 rraval

I believe the existing behavior creates the grandchild relationship we need (direnv creates a bkt subprocess which starts a grandchild subprocess and then exits, causing the grandchild to be re-parented). But going off this comment I suspect the issue might be that force_update_async isn't closing stdin as well. Any chance you're able to test #61 (see the build artifacts) to see if it resolves your issue?

I'm not sure how best to add test coverage of this case; I played around briefly with modifying cli::cache_refreshes_in_background to use spawn() but wasn't able to trigger the issue.

dimo414 avatar Mar 30 '25 10:03 dimo414

Hi, just a quick update. I haven't forgotten about this bug report, just been juggling many things. Will try to get back to it and verify this week.

rraval avatar Apr 15 '25 12:04 rraval

Sadly the fix in #61 does not seem to work.

Here's an MCVE: https://gist.github.com/rraval/e74b7d0426b8187c8967e088c0f141a3

To reproduce:

$ direnv --version
2.35.0

$ git clone [email protected]:e74b7d0426b8187c8967e088c0f141a3.git direnv-60-mcve
Cloning into 'direnv-60-mcve'...

$ direnv allow direnv-60-mcve

Then with either bkt 0.8.0 or a custom build from #61.

Running the following for the first time takes 5 seconds:

~$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt

________________________________________________________
Executed in    5.03 secs      fish           external
   usr time   14.99 millis    0.00 micros   14.99 millis
   sys time   14.06 millis  785.00 micros   13.28 millis

Running it immediately after is fast:

$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt


________________________________________________________
Executed in   23.77 millis    fish           external
   usr time   11.98 millis    0.00 micros   11.98 millis
   sys time   11.91 millis  571.00 micros   11.34 millis

Running it after 10 seconds (the --stale timeout) becomes slow again, even though --ttl 10m

$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt


________________________________________________________
Executed in    5.03 secs      fish           external
   usr time   13.92 millis  953.00 micros   12.96 millis
   sys time   10.90 millis    0.00 micros   10.90 millis

rraval avatar Apr 18 '25 03:04 rraval

Thanks for the reproduction steps! I can reproduce what you're seeing, though interestingly the process does appear to be being re-parented if I add ps -ef --forest to slow.sh:

# cold cache, 5s expected
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Apr22 pts/0    00:00:00 /bin/bash
root      4085     1  0 01:11 pts/0    00:00:00 direnv exec direnv-60-mcve echo
root      4092  4085  0 01:11 pts/0    00:00:00  \_ /usr/bin/bash -c eval "$("/usr/bin/direnv" stdlib)" && __main__ source_env "/direnv-60-mcve/.envrc"
root      4111  4092  0 01:11 pts/0    00:00:00      \_ bkt --ttl 10m --stale 10s -- ./slow.sh
root      4113  4111  0 01:11 pts/0    00:00:00          \_ bash ./slow.sh
root      4117  4113  0 01:11 pts/0    00:00:00              \_ ps -ef --forest


# wait 10+s, output should be cached but refreshed in the background
# output is fast+cached but we still wait 5s before exiting
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Apr22 pts/0    00:00:00 /bin/bash
root      4085     1  0 01:11 pts/0    00:00:00 direnv exec direnv-60-mcve echo
root      4092  4085  0 01:11 pts/0    00:00:00  \_ /usr/bin/bash -c eval "$("/usr/bin/direnv" stdlib)" && __main__ source_env "/direnv-60-mcve/.envrc"
root      4111  4092  0 01:11 pts/0    00:00:00      \_ bkt --ttl 10m --stale 10s -- ./slow.sh
root      4113  4111  0 01:11 pts/0    00:00:00          \_ bash ./slow.sh
root      4117  4113  0 01:11 pts/0    00:00:00              \_ ps -ef --forest


# output has been refreshed and cached asynchronously
# notice the call to `bkt --force` with PPID 1
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Apr22 pts/0    00:00:00 /bin/bash
root      4124     1  0 01:11 pts/0    00:00:00 direnv exec direnv-60-mcve echo
root      4151     1  0 01:11 pts/0    00:00:00 /bkt --force --ttl 10m --stale 10s -- ./slow.sh
root      4154  4151  0 01:11 pts/0    00:00:00  \_ bash ./slow.sh
root      4163  4154  0 01:11 pts/0    00:00:00      \_ ps -ef --forest


real    0m0.037s
user    0m0.026s
sys     0m0.010s

Running the same command directly (bkt --ttl 10m --stale 10s -- direnv-60-mcve/slow.sh) doesn't reproduce the hang.

Here is where direnv invokes the .envrc via source_env. Nothing jumps out as particularly unusual in this function. I tried bkt --ttl 10m --stale 10s -- bash -c '. direnv-60-mcve/slow.sh' as well to see if it's related to sourcing the script in some way, but that similarly returned quickly.

dimo414 avatar Apr 23 '25 08:04 dimo414