thin-edge.io
thin-edge.io copied to clipboard
`tedge connect` does not detect mapper/agent via symlinks
Discussed in https://github.com/thin-edge/thin-edge.io/discussions/880
Originally posted by toewsar February 15, 2022
When using tedge connect c8y
it tries to find out if the mapper and agent are installed. This is done by the RUST-which().
In my case the binaries aren't installed at /usr/bin or similar. Therefore the tedge cli does not find the installed mapper and agent.
I tried to create symlinks to /usr/bin to tell tedge that the mapper and agent are installed, but it still failed. The shell command which
found the binaries.
$ sudo which tedge_mapper /usr/bin/tedge_mapper
$ sudo tedge connect c8y Checking if sysv is available.
Checking if configuration for requested bridge already exists.
Validating the bridge certificates.
Create the device.
Saving configuration for requested bridge.
Restarting mosquitto service.
Awaiting mosquitto to start. This may take up to 5 seconds.
Enabling mosquitto service on reboots.
Successfully created bridge connection!
Sending packets to check connection. This may take up to 2 seconds.
Connection check is successfull.
Checking if tedge-mapper is installed.
Warning: tedge_mapper is not installed.
Enabling software management.
Checking if tedge-agent is installed.
Info: Software management is not installed. So, skipping enabling related components.
It would be nice to also accept symlinks.
The issue has been observed by @toewsar.
The first point to note is that the mapper is not detected through the symlink while things are okay for the agent where there is no indirection.
$ ls -lh /usr/bin/tedge*
-rwxr-xr-x 1 root root 7.0M Feb 16 10:31 /usr/bin/tedge
-rwxr-xr-x 1 root root 7.0M Feb 16 10:31 /usr/bin/tedge_agent
lrwxrwxrwx 1 root root 40 Feb 16 10:36 /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper
$ sudo tedge connect c8y
...
Checking if tedge-mapper is installed.
Warning: tedge_mapper is not installed.
...
Checking if tedge-agent is installed.
Starting tedge-agent service.
...
I've written a small Rust program which only calls which
:
use std::env;
use which::which;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
println!("executable as parameter needed");
} else {
if which(&args[1]).is_err() {
println!("Nope: `{}` is not found by `which`.", &args[1]);
} else {
println!("Yes: `{}` is found by `which`.", &args[1]);
}
}
}
With both which
4.2.2 and 4.2.4
, it behaves correctly:
$ ls -lh /usr/bin/tedge*
-rwxr-xr-x 1 ... /usr/bin/tedge
-rwxr-xr-x 1 ... /usr/bin/tedge_agent
lrwxrwxrwx 1 ... /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper # working link
lrwxrwxrwx 1 ... /usr/bin/tedge_mapper2 -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper2 # non-existing target
Output:
admin@HMI-1723:~$ ./whichtest tedge_agent
Yes: `tedge_agent` is found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper
Yes: `tedge_mapper` is found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper2
Nope: `tedge_mapper2` is not found by `which`.
$PATH
has nothing to do with this, I've added a hardcoded check for /usr/bin/tedge_mapper
and neither is this found by tedge
After some further analyses, we found out that a symlink alone is not the problem. It has to do with permissions.
We're running tedge
as root confirmed by:
use users::get_current_username;
let uname = get_current_username().unwrap();
println!("Running as user {:?}", &uname);
The link is defined as follows:
$ ls -lh /usr/bin/tedge_mapper
lrwxrwxrwx 1 root root 22 Feb 16 14:38 /usr/bin/tedge_mapper -> /usr/blub/tedge_mapper
The directory, in which the actual tedge_mapper
resides, has these permissions:
$ ls -lha /usr/blub/
drwxr-xr-- 2 root data 4.0K Feb 16 14:37 .
drwxr-xr-x 11 root root 4.0K Feb 16 14:37 ..
-rwxr-xr-x 1 root root 7.5M Feb 16 14:37 tedge_mapper
We're root
, owner is root
with execute
permissions, things should be fine. But they are not!
root
is not in group data
, but on its own in a shell, this is not a problem.
When I run this debug code on its own, it works!
use std::env;
use std::fs;
use which::which;
fn main() {
let args: Vec<String> = env::args().collect();
if args.len() < 2 {
println!("executable as parameter needed");
} else {
match which(&args[1]) {
Err(e) => println!("🏀Nope: `{}` is not found by `which`. {}", &args[1], e),
Ok(p) => {
println!("🏀Yes: `{}` is found by `which`. {:?}", &args[1], p);
match fs::metadata(&p) {
Ok(m) => println!("🏀Trying {:?} filetype {:?} symlink {:?}", &p, &m.file_type(), &m.is_symlink()),
Err(e) => println!("🏀Failed {:?} {:?}", &p, e),
}
}
}
}
}
According to the output, which
finds the target tedge_mapper
file (metadata
follows symlinks, therefore this attribute is false (I can't see the sense behind this, but anyway))!
$ ./whichtest tedge_mapper
🏀Yes: `tedge_mapper` is found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false
$ sudo ./whichtest tedge_mapper
🏀Yes: `tedge_mapper` is found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false
which
is not the problem's originator.
std::fs::metadata
is.
I've added the following to ConnectCommand::Command
(file command.rs
, after the line
println!("Checking if tedge-mapper is installed.\n");
match fs::metadata("/usr/blub/tedge_mapper") {
Ok(m) => println!("Trying /usr/blub/tedge_mapper :{:?} 🍌", m.file_type()),
Err(e) => println!("Failed /usr/blub/tedge_mapper :{:?} 🍌", e),
}
When run via sudo tedge connect c8y
, I get:
Failed /usr/blub/tedge_mapper :Os { code: 13, kind: PermissionDenied, message: "Permission denied" } 🍌
What is different when running this code on its own compared to as part of tedge
❓
I think I've found the reason. It's in
let _user_guard = user_manager.become_user(tedge_users::TEDGE_USER)?;
This boils down to
pub fn switch_user_group(uid: uid_t, gid: gid_t) -> io::Result<SwitchUserGuard> {
...
set_effective_gid(gid)?;
set_effective_uid(uid)?;
In other words, the euid of root
is replaced by tedge
's uid, the egid is replaced by tedge
's gid.
Other supplementary groups of root are left untouched and stay.
Again, we're accessing `/mnt/data with its permissions as this:
$ ls -lhd /mnt/data/
drwxrwsr-- 8 root data 4.0K Feb 17 07:49 /mnt/data/
root as the owner is allowed to access this directory, also the group data
.
root is not a member of data
(no need for that), while user tedge
is member of data
.
But in switch_user_group
, only the egid tedge
is set, not the user tedge
's other groups like data
.
Thus the process will run as tedge
with only the group tedge
and is disallowed to access /mnt/data
.
For testing purposes, I've put root
into the group data
. Then things work flawlessly.
🎯❓ So, how can we not only set the egid, but also the target user's supplementary groups?
Thank you for this detailed analysis. However, I have to understand what is the motivation to put the tedge_mapper
binary in a directory with restricted access?
I'm not keen on the idea to add this complexity to the tedge
command line tool. There is already too much in it! Rather than adding group handling features, I would prefer to remove hardcoded user handling features. I mean to have tedge connect
prepare configuration files used by external tools taking the responsibility to run the services using the locally-defined settings.
Do I understand your reduction wish correctly if I rephrase it as follows?
tedge
should write configuration files of itself and the various cloud connectors and mappers, but not start other processes
You've mentioned "external tools". Do you think of systemd and alike or of helper scripts which continue what tedge no longer does?
Regarding the user&group handling:
To my eyes, the current way is both too lax and too restrictive at the same time.
Too lax, because root
doesn't drop all capabilities (supplementary groups won't be dropped) and thus leaks permissions.
Too restrictive, because the user tedge
's supplementary groups are not added.
But yes, not messing with effective users and leaving that to "the outside" sounds plausible (as long as "the outside" knows, what to do ...)
Hi @hansdaniels,
I am interested to understand the reason why you move the executables to '/mnt/data/hmi/tedge/usr/bin/'. That would help to align about a good solution. Can you explain the motivation behind that?
To unblock you for the moment, a workaround might be to add user root
to group data
. Since switch_user_group()
seems not to drop supplementary groups, group data
should remain.
@hansdaniels What is the status of this issue on your side?
We no responses within 5 days, I will close this issue.
Hi @didier-wenzek,
did you change anything in this direction? If not, we still have the problem.
@cstoidner: The reason is, because our device has base system (/usr/
belongs to the base system) which should no modified.
Customer Apps have to be installed in /mnt/data
.
Your reactivity makes things clear: this is important for you!
Since this ticket is opened, many things have changed but the issue is still here. As a summary:
- If the
tedge
user can run thetedge_mapper
command only thanks to having thetedge
user added to some group (here thetedge
user be added to thedata
group to be able the access the directory containing thetedge_mapper
directory); - Then the
sudo tedge connect c8y
command correctly connects the device to Cumulocity, but doesn't restart the mapper due to a failing check (tedge-mapper
being not accessible is erroneously interpreted astedge-mapper
is not installed). - The root cause is that the
tedge connect
changes the effective group id - loosing any access granted via groups to thetedge user
. - There are several workarounds.
- Granting traversal access of
/mnt/data/
to all users. - Adding
root
to thedata
group (The fact that this works highlights the correctness of one of the comment: "This is to lax because root doesn't drop all capabilities"). - Restarting manually the mapper on reconnect.
- Granting traversal access of
- There are some quick but no convincing fixes.
- Do the
which
check asroot
and nottedge
. - Stop to change the group.
- Do the
- Long-term fixes are more involved.
- Make useless the need to restart the mapper on connect. This is my preferred solution and a work in that direction is on going see https://github.com/thin-edge/thin-edge.io/issues/1201. A follow up task will be to remove the culprit mapper restart call in the tedge command.
- Stop playing with effective user and group in the
tedge
command. As highlighted in this thread, this is not really effective from a user perspective. The main idea was to create resources for different users (tedge
,mosquitto
, ... and others that have been deprecated meantime). The simpler would be to create these resources with explicit user, group and mod: see https://github.com/thin-edge/thin-edge.io/issues/1207. - Stop to have a
tedge
command that try to do things from the inside (restarting mosquitto and the mapper) with extra complexity to configure the init system. This would be so simple to script this from the outside. However, this needs to be done after a better understanding of the base commands that thin-edge must provide.
As a conclusion. We will address this issue but with no rush. 1) by removing the necessity to restart the mapper on connect and 2) by removing the dance with effective users and groups.
@toewsar We have finally implement one of the 3 fixes: the second one labelled ii) The tedge
command does no more switch back and forth from root
and the tedge
user. Your main issue should be fixed. Can you double check that? Thank you.
@didier-wenzek I check the version 0.7.3 and it worked. Thanks,