Daemonize --stale refreshes
Hi there, I'm not a 100% sure if the problem is on the bkt side or somewhere else.
My use case is as follows:
- direnv loads the
.envrcfor my project .envrcinvokes a bunch ofbktprocesses withbkt --ttl=90d --stale 1h --discard-failures -- <very slow command>- Every hour, loading the environment variables gets stuck with a message like:
direnv: ([<.envrc path>]) is taking a while to execute. Use CTRL-C to give up.
I've verified that the direct bkt invocations return quickly by adding set -x to my .envrc.
If I'm understanding https://github.com/direnv/direnv/issues/626#issuecomment-619574344 correctly, the bash process that direnv is launching is still waiting for all sub-sub-processes to exit, which somehow includes the child process that bkt launches to refresh the --stale cache.
Would it be possible to truly daemonize the refresh, allowing direnv to run quickly?
Thanks for the report! Yes we should be able to get --stale working with this tool. You can see the current implementation here, do you happen to know what's needed to make direnv happy? Alternatively, are you able to share an MCVE that reproduces the issue (with or without direnv specifically). No worries if not, it would just help me investigate.
I'm far from an expert, but my vague understanding is that a double-fork is necessary.
- The parent
bktinvocation needs to fork a child, which needs to immediately fork a grandchild. - The grandchild can invoke the logic in
force_update_async. - The child should exit immediately, which causes the grandchild to be re-parented to the top level
initprocess.
My googling found this discourse thread that links to the daemonize crate that might handle this for you.
I will put together a MCVE when I get a moment.
I believe the existing behavior creates the grandchild relationship we need (direnv creates a bkt subprocess which starts a grandchild subprocess and then exits, causing the grandchild to be re-parented). But going off this comment I suspect the issue might be that force_update_async isn't closing stdin as well. Any chance you're able to test #61 (see the build artifacts) to see if it resolves your issue?
I'm not sure how best to add test coverage of this case; I played around briefly with modifying cli::cache_refreshes_in_background to use spawn() but wasn't able to trigger the issue.
Hi, just a quick update. I haven't forgotten about this bug report, just been juggling many things. Will try to get back to it and verify this week.
Sadly the fix in #61 does not seem to work.
Here's an MCVE: https://gist.github.com/rraval/e74b7d0426b8187c8967e088c0f141a3
To reproduce:
$ direnv --version
2.35.0
$ git clone [email protected]:e74b7d0426b8187c8967e088c0f141a3.git direnv-60-mcve
Cloning into 'direnv-60-mcve'...
$ direnv allow direnv-60-mcve
Then with either bkt 0.8.0 or a custom build from #61.
Running the following for the first time takes 5 seconds:
~$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt
________________________________________________________
Executed in 5.03 secs fish external
usr time 14.99 millis 0.00 micros 14.99 millis
sys time 14.06 millis 785.00 micros 13.28 millis
Running it immediately after is fast:
$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt
________________________________________________________
Executed in 23.77 millis fish external
usr time 11.98 millis 0.00 micros 11.98 millis
sys time 11.91 millis 571.00 micros 11.34 millis
Running it after 10 seconds (the --stale timeout) becomes slow again, even though --ttl 10m
$ time direnv exec direnv-60-mcve echo
direnv: loading ~/direnv-60-mcve/.envrc
/etc/profiles/per-user/rraval/bin/bkt
________________________________________________________
Executed in 5.03 secs fish external
usr time 13.92 millis 953.00 micros 12.96 millis
sys time 10.90 millis 0.00 micros 10.90 millis
Thanks for the reproduction steps! I can reproduce what you're seeing, though interestingly the process does appear to be being re-parented if I add ps -ef --forest to slow.sh:
# cold cache, 5s expected
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Apr22 pts/0 00:00:00 /bin/bash
root 4085 1 0 01:11 pts/0 00:00:00 direnv exec direnv-60-mcve echo
root 4092 4085 0 01:11 pts/0 00:00:00 \_ /usr/bin/bash -c eval "$("/usr/bin/direnv" stdlib)" && __main__ source_env "/direnv-60-mcve/.envrc"
root 4111 4092 0 01:11 pts/0 00:00:00 \_ bkt --ttl 10m --stale 10s -- ./slow.sh
root 4113 4111 0 01:11 pts/0 00:00:00 \_ bash ./slow.sh
root 4117 4113 0 01:11 pts/0 00:00:00 \_ ps -ef --forest
# wait 10+s, output should be cached but refreshed in the background
# output is fast+cached but we still wait 5s before exiting
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Apr22 pts/0 00:00:00 /bin/bash
root 4085 1 0 01:11 pts/0 00:00:00 direnv exec direnv-60-mcve echo
root 4092 4085 0 01:11 pts/0 00:00:00 \_ /usr/bin/bash -c eval "$("/usr/bin/direnv" stdlib)" && __main__ source_env "/direnv-60-mcve/.envrc"
root 4111 4092 0 01:11 pts/0 00:00:00 \_ bkt --ttl 10m --stale 10s -- ./slow.sh
root 4113 4111 0 01:11 pts/0 00:00:00 \_ bash ./slow.sh
root 4117 4113 0 01:11 pts/0 00:00:00 \_ ps -ef --forest
# output has been refreshed and cached asynchronously
# notice the call to `bkt --force` with PPID 1
$ time direnv exec direnv-60-mcve echo
direnv: loading /direnv-60-mcve/.envrc
/bin/bkt
This is a slow script
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Apr22 pts/0 00:00:00 /bin/bash
root 4124 1 0 01:11 pts/0 00:00:00 direnv exec direnv-60-mcve echo
root 4151 1 0 01:11 pts/0 00:00:00 /bkt --force --ttl 10m --stale 10s -- ./slow.sh
root 4154 4151 0 01:11 pts/0 00:00:00 \_ bash ./slow.sh
root 4163 4154 0 01:11 pts/0 00:00:00 \_ ps -ef --forest
real 0m0.037s
user 0m0.026s
sys 0m0.010s
Running the same command directly (bkt --ttl 10m --stale 10s -- direnv-60-mcve/slow.sh) doesn't reproduce the hang.
Here is where direnv invokes the .envrc via source_env. Nothing jumps out as particularly unusual in this function. I tried bkt --ttl 10m --stale 10s -- bash -c '. direnv-60-mcve/slow.sh' as well to see if it's related to sourcing the script in some way, but that similarly returned quickly.