tectonic
tectonic copied to clipboard
`app_dirs2` startup crash on Android in Termux
Dear All,
I installed the latest repo version of Tectonic on Termux (v0.8.0), but it always returns with the following error message:
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /home/builder/.cargo/registry/src/github.com-1ecc6299db9ec823/ndk-glue-0.3.0/src/lib.rs:54:39
I believe that this is related to this issue: https://github.com/rust-windowing/android-ndk-rs/issues/126
Are there any workarounds?
Unfortunately I haven't seen any discussion of this issue beyond the other report that you found.
You're attempting to run Tectonic on the Android OS, correct? Are you able to run it with the environment variable RUST_BACKTRACE=1 and get a full backtrace of where the crash is occurring? That would be very helpful for understanding the source of the problem.
Also, if you're able to build Tectonic on/for Android and understand where the dependency on the ndk-glue crate is even arising, that might be helpful.
ndk-glue host an Android Activity just like Termux, and those can't be nested. As shown in https://github.com/rust-windowing/android-ndk-rs/issues/126 the caller is trying to get at NativeActivity via https://github.com/rust-windowing/android-ndk-rs/blob/30d032aad1808fc88145906df3862402192ccea5/ndk-glue/src/lib.rs#L53-L55 which won't be initialized for "regular executables" that aren't running as an "Android App" through the Android Framework.
Having a stacktrace is vital in understanding what crate is responsible for poking at this (it's probably an ancient one as we migrated to ndk-context since the beginning of this year, but it'll suffer the same issue). This could be app_dirs2 (a dependency of tectonic) which is hardcoded to look up Android paths using a jni context pointer despite running in a "sort of" Linux "desktop" environment inside Termux.
Thank you for the pointer! It looks like cargo tree --target=armv7-linux-androideabi will print out a dependency tree for the specified target without having to have its build environment set up, and that lets me see that ndk-context shows up as a dep of app_dirs2 and nowhere else. That's good enough for me to consider it the presumptive culprit. I'll look into potential solutions.
@pkgw Except that ndk-context shouldn't invoke ndk-glue, it's ndk-glue that pulls out the activity and feeds it to ndk-context (on newer versions). While ndk-context will also cause similar panics, this issue clearly indicates ndk-glue 0.3 (which is quite old at this point).
@MarijnS95 I think it's just that Termux is building an older release. 0.8.0 was released in October 2021.
@pkgw That's quite likely then, app_dirs2 only migrated to ndk-context in February this year: https://github.com/app-dirs-rs/app_dirs2/commit/35a9b19487cf9173549ef5c246d60fcd65858a9f
@MarijnS95 It seems like you're familiar with the specific app_dirs2 situation and have contributed to the repo. Assuming that it's the culprit, do you understand what the solution would be? Perhaps a Cargo feature added to app_dirs2 would help? Or is Termux doing something sufficiently weird that it's basically their responsibility to patch things somehow?
@pkgw I'd punt this on app_dirs2 (or any other app like it) assuming to run in an Android app context when target_os = "android", and ndk-glue/ndk-context providing global statics with the JNI env and Android Context pointer.
However, I don't have a clear-cut solution for this, as it's effectively up to the application to determine where to source its directories from. In that sense a Cargo feature in app_dirs2 stating that the caller wishes to use Android for its paths, or "regular" environment variables (the "Linux desktop" case in app_dirs2) seems most fitting for now. Not sure what the right default would be though.
We invented ndk-context to deal with versioning problems where apps were accessing static globals across duplicated ndk-glue versions in their dependency tree (each with a copy of the static globals, but only one initialized), but my clear preference is to remove (publicly accessible) static globals entirely and instead pass down instances of these variables - which would lead to compiler errors on a mismatch.
That same mechanism can be used to optionally provide app_dirs2 with a JNI pointer and Android Context: if it has it, it'll go the Android route for sourcing directories. If it doesn't, it'll search for mandatory env vars and otherwise panic (at least I think that's what it'd do now, or return None/Err), together with a useful message pointing towards default-optional Android support.
As a third option we could - collectively as a community - coin up a --cfg flag (which doesn't require [feature] flags to be set and passed through every transitive dependency) that distinguishes Android app support versus something that runs on a more desktop-like setup yet builds as target_os = "android", albeit the userbase is small enough (running binaries in Termux and adb shell) that I'd rather stop sneaking static globals behind peoples back and that'd implicitly solve this issue too.
Thank you very much for providing such detailed expertise for all of this!
I think I get the contours of the broader solution that you're envisioning, and it sounds like something that will take at least a little while to roll out into the ecosystem. Is there a shorter-term workaround that you'd feel comfortable recommending? If I'm understanding you correctly, is it true that on the Termux/Android there are some magic environment variables that we could potentially use?
If something like that would be too hacky to propose for app_dirs2 itself, we could potentially add some compile-time logic at the Tectonic layer to implement a workaround. It would be nice to squash this.
If I'm understanding you correctly, is it true that on the Termux/Android there are some magic environment variables that we could potentially use?
Nothing magic, just that on a typical Linux desktop (which is what Termux provides within its app) there are standardized environment variables like $XDG_CONFIG_HOME which don't exist on Android. Instead, on Android app_dirs2 returns a per-app special location where apps can store their files, because the entire image layout is different (and there are strict rules for where an app can and cannot read/write).
For the short term I guess we can make app_dirs2 look for "Linux desktop" environment variables first (those won't be set on Android unless in very) before deferring to the Android JNI + Context; or check if the pointers are set and fall back to env vars (would have to extend ndk-context for that iirc).
It'd be good to pull some other stakeholders into this: @dvc94ch @msiglreith @kornelski before embarking on any of these paths.
I think it'd be ideal if ndk-context could have non-panicking methods and support querying what context it's running in.
Could we use catch_unwind to catch the panic and fall back?
Checking for Linux desktop env vars first seems like an OK workaround, so PR for this is welcome.
A Cargo feature flag isn't a good solution IMHO, because feature flags are cumbersome to set for dependencies of dependencies (which app_dirs is going to be), and end users need to know about such flag in the first place. Non-crashy default is better.
Am I understanding this correctly: when a cli app is run, no android context exists, causing the panic? That sounds like a fun edge case. What is the correct way of getting a per app directory in this case? Just write to /data/local/tmp? I don't think you get persistent storage on a non rooted android.
I think it'd be ideal if
ndk-contextcould have non-panicking methods and support querying what context it's running in.
I carefully tried to avoid this case as it may (and is currently used to) indicate that the context pointer may not (yet) have been initialized, otherwise this would have been a neat option.
Could we use
catch_unwindto catch the panic and fall back?
Better to add a fallible function in that case, we can easily update ndk-context as long as we don't do anything breaking (and the dependent crate sets the right minimum patch version).
Checking for Linux desktop env vars first seems like an OK workaround, so PR for this is welcome.
This has my preference as well, I doubt any Android app with Rust will have these XDG variables set.
A Cargo feature flag isn't a good solution IMHO, because feature flags are cumbersome to set for dependencies of dependencies (which app_dirs is going to be), and end users need to know about such flag in the first place. Non-crashy default is better.
Exactly, feature flags are supposed to be additive and very hard to control when more than one crate (somewhere in the dependency tree) depend on app_dirs2.
Am I understanding this correctly: when a cli app is run, no android context exists, causing the panic?
Correct. It exists for Termux but that should intentionally be shadowed from this Rust app, which IMO should pretend like it's running in a Linux desktop/cli environment. Only few cli binaries/scripts inside termux like termux-url-opener understand that they're running within an app that runs on Android.
That sounds like a fun edge case. What is the correct way of getting a per app directory in this case? Just write to /data/local/tmp? I don't think you get persistent storage on a non rooted android.
Environment variables. Since Termux provides a sort-of Linux environment, you'd expect to use that and put config values inside $HOME/.config/<my app> for example (since many different applications run under com.termux, we can't just hijack /data/data/com.termux/files, and it'd be very counter-intuitive at that).
As another data point, in the Termux issue that I filed (auto-linked above), I was pointed to https://github.com/termux/termux-packages/commit/ff9aaad94b27feafc5d0639c78bec3103d7a5850, where the Termux folks solved a similar problem by patching the package to downgrade from app_dirs2 to plain app_dirs. I think that that would cause it to basically use the Linux mode and locate dirs using XDG environment variables.
I'm willing to take a stab at preparing a PR for app_dirs2 to check for Linux XDG env vars in the Android mode, but wouldn't be able to test it myself.
Checking for Linux desktop env vars first seems like an OK workaround, so PR for this is welcome.
This has my preference as well, I doubt any Android app with Rust will have these XDG variables set.
Seems reasonable to put this workaround in the app_dirs2 crate.
I think it'd be ideal if ndk-context could have non-panicking methods and support querying what context it's running in.
Looks like adding a method to optionally get the context would work. But that doesn't allow detecting that it's a termux environment. Does termux set the TERM variable? In that case app_dirs2 should just check if it's a termux environment before trying to access the ndk-context.
I'm willing to take a stab at preparing a PR for
app_dirs2to check for Linux XDG env vars in the Android mode, but wouldn't be able to test it myself.
There's fortunately only a single get_app_dir() for Unix so I'd call that, and ok_or_else() into the Android code.
Looking at:
[target.'cfg(all(unix, not(target_os = "macos")))'.dependencies]
xdg = "2.2.0"
At least the xdg crate safely compiles under Android; so it should just return an Err there.
app_dirs2 pull request — untested besides compiling! — filed: https://github.com/app-dirs-rs/app_dirs2/pull/34.