thin-edge.io icon indicating copy to clipboard operation
thin-edge.io copied to clipboard

`tedge connect` does not detect mapper/agent via symlinks

Open didier-wenzek opened this issue 3 years ago • 15 comments

Discussed in https://github.com/thin-edge/thin-edge.io/discussions/880

Originally posted by toewsar February 15, 2022 When using tedge connect c8y it tries to find out if the mapper and agent are installed. This is done by the RUST-which(). In my case the binaries aren't installed at /usr/bin or similar. Therefore the tedge cli does not find the installed mapper and agent.

I tried to create symlinks to /usr/bin to tell tedge that the mapper and agent are installed, but it still failed. The shell command which found the binaries.

$ sudo which tedge_mapper /usr/bin/tedge_mapper

$ sudo tedge connect c8y Checking if sysv is available.

Checking if configuration for requested bridge already exists.

Validating the bridge certificates.

Create the device.

Saving configuration for requested bridge.

Restarting mosquitto service.

Awaiting mosquitto to start. This may take up to 5 seconds.

Enabling mosquitto service on reboots.

Successfully created bridge connection!

Sending packets to check connection. This may take up to 2 seconds.

Connection check is successfull.

Checking if tedge-mapper is installed.

Warning: tedge_mapper is not installed.

Enabling software management.

Checking if tedge-agent is installed.

Info: Software management is not installed. So, skipping enabling related components.

It would be nice to also accept symlinks.

didier-wenzek avatar Feb 16 '22 10:02 didier-wenzek

The issue has been observed by @toewsar.

The first point to note is that the mapper is not detected through the symlink while things are okay for the agent where there is no indirection.

$ ls -lh /usr/bin/tedge*
-rwxr-xr-x    1 root     root        7.0M Feb 16 10:31 /usr/bin/tedge
-rwxr-xr-x    1 root     root        7.0M Feb 16 10:31 /usr/bin/tedge_agent
lrwxrwxrwx    1 root     root          40 Feb 16 10:36 /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper
$ sudo tedge connect c8y
...
Checking if tedge-mapper is installed.
Warning: tedge_mapper is not installed.
...
Checking if tedge-agent is installed.
Starting tedge-agent service.
...

didier-wenzek avatar Feb 16 '22 10:02 didier-wenzek

I've written a small Rust program which only calls which:

use std::env;
use which::which;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() < 2 {
        println!("executable as parameter needed");
    } else {
        if which(&args[1]).is_err() {
            println!("Nope: `{}` is not found by `which`.", &args[1]);
        } else {
            println!("Yes:  `{}` is     found by `which`.", &args[1]);
        }
    }
}

With both which 4.2.2 and 4.2.4, it behaves correctly:

$ ls -lh /usr/bin/tedge*
-rwxr-xr-x    1 ... /usr/bin/tedge
-rwxr-xr-x    1 ... /usr/bin/tedge_agent
lrwxrwxrwx    1 ... /usr/bin/tedge_mapper -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper # working link
lrwxrwxrwx    1 ... /usr/bin/tedge_mapper2 -> /mnt/data/hmi/tedge/usr/bin/tedge_mapper2 # non-existing target

Output:

admin@HMI-1723:~$ ./whichtest tedge_agent
Yes:  `tedge_agent` is     found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper
Yes:  `tedge_mapper` is     found by `which`.
admin@HMI-1723:~$ ./whichtest tedge_mapper2
Nope: `tedge_mapper2` is not found by `which`.

hansdaniels avatar Feb 16 '22 10:02 hansdaniels

$PATH has nothing to do with this, I've added a hardcoded check for /usr/bin/tedge_mapper and neither is this found by tedge

hansdaniels avatar Feb 16 '22 11:02 hansdaniels

After some further analyses, we found out that a symlink alone is not the problem. It has to do with permissions.

We're running tedge as root confirmed by:

                use users::get_current_username;
                let uname = get_current_username().unwrap();
                println!("Running as user {:?}", &uname);

The link is defined as follows:

$ ls -lh /usr/bin/tedge_mapper
lrwxrwxrwx    1 root     root          22 Feb 16 14:38 /usr/bin/tedge_mapper -> /usr/blub/tedge_mapper

The directory, in which the actual tedge_mapper resides, has these permissions:

$ ls -lha /usr/blub/
drwxr-xr--    2 root     data        4.0K Feb 16 14:37 .
drwxr-xr-x   11 root     root        4.0K Feb 16 14:37 ..
-rwxr-xr-x    1 root     root        7.5M Feb 16 14:37 tedge_mapper

We're root, owner is root with execute permissions, things should be fine. But they are not!

root is not in group data, but on its own in a shell, this is not a problem.

When I run this debug code on its own, it works!

use std::env;
use std::fs;
use which::which;

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() < 2 {
        println!("executable as parameter needed");
    } else {
        match which(&args[1]) {
            Err(e) => println!("🏀Nope: `{}` is not found by `which`. {}", &args[1], e),
            Ok(p) => {
                println!("🏀Yes:  `{}` is     found by `which`. {:?}", &args[1], p);
                match fs::metadata(&p) {
                    Ok(m) => println!("🏀Trying {:?} filetype {:?} symlink {:?}", &p, &m.file_type(), &m.is_symlink()),
                    Err(e) => println!("🏀Failed {:?} {:?}", &p, e),
                }
            }
        }
    }
}

According to the output, which finds the target tedge_mapper file (metadata follows symlinks, therefore this attribute is false (I can't see the sense behind this, but anyway))!

$ ./whichtest tedge_mapper
🏀Yes:  `tedge_mapper` is     found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false
$ sudo ./whichtest tedge_mapper
🏀Yes:  `tedge_mapper` is     found by `which`. "/usr/bin/tedge_mapper"
🏀Trying "/usr/bin/tedge_mapper" filetype FileType(FileType { mode: 33261 }) symlink false

which is not the problem's originator. std::fs::metadata is.

I've added the following to ConnectCommand::Command (file command.rs , after the line

println!("Checking if tedge-mapper is installed.\n");
                match fs::metadata("/usr/blub/tedge_mapper") {
                    Ok(m) => println!("Trying /usr/blub/tedge_mapper :{:?} 🍌", m.file_type()),
                    Err(e) => println!("Failed /usr/blub/tedge_mapper :{:?} 🍌", e),
                }

When run via sudo tedge connect c8y, I get:

Failed /usr/blub/tedge_mapper :Os { code: 13, kind: PermissionDenied, message: "Permission denied" } 🍌

What is different when running this code on its own compared to as part of tedge

hansdaniels avatar Feb 16 '22 13:02 hansdaniels

I think I've found the reason. It's in

let _user_guard = user_manager.become_user(tedge_users::TEDGE_USER)?;

This boils down to

pub fn switch_user_group(uid: uid_t, gid: gid_t) -> io::Result<SwitchUserGuard> {
...
    set_effective_gid(gid)?;
    set_effective_uid(uid)?;

In other words, the euid of root is replaced by tedge's uid, the egid is replaced by tedge's gid. Other supplementary groups of root are left untouched and stay.

Again, we're accessing `/mnt/data with its permissions as this:

$ ls -lhd /mnt/data/
drwxrwsr--    8 root     data        4.0K Feb 17 07:49 /mnt/data/

root as the owner is allowed to access this directory, also the group data. root is not a member of data (no need for that), while user tedge is member of data.

But in switch_user_group, only the egid tedge is set, not the user tedge's other groups like data. Thus the process will run as tedge with only the group tedge and is disallowed to access /mnt/data.

For testing purposes, I've put root into the group data. Then things work flawlessly.

🎯❓ So, how can we not only set the egid, but also the target user's supplementary groups?

hansdaniels avatar Feb 17 '22 13:02 hansdaniels

Thank you for this detailed analysis. However, I have to understand what is the motivation to put the tedge_mapper binary in a directory with restricted access?

I'm not keen on the idea to add this complexity to the tedge command line tool. There is already too much in it! Rather than adding group handling features, I would prefer to remove hardcoded user handling features. I mean to have tedge connect prepare configuration files used by external tools taking the responsibility to run the services using the locally-defined settings.

didier-wenzek avatar Feb 18 '22 15:02 didier-wenzek

Do I understand your reduction wish correctly if I rephrase it as follows?

tedge should write configuration files of itself and the various cloud connectors and mappers, but not start other processes

You've mentioned "external tools". Do you think of systemd and alike or of helper scripts which continue what tedge no longer does?

hansdaniels avatar Feb 18 '22 15:02 hansdaniels

Regarding the user&group handling: To my eyes, the current way is both too lax and too restrictive at the same time. Too lax, because root doesn't drop all capabilities (supplementary groups won't be dropped) and thus leaks permissions. Too restrictive, because the user tedge's supplementary groups are not added.

But yes, not messing with effective users and leaving that to "the outside" sounds plausible (as long as "the outside" knows, what to do ...)

hansdaniels avatar Feb 18 '22 15:02 hansdaniels

Hi @hansdaniels,

I am interested to understand the reason why you move the executables to '/mnt/data/hmi/tedge/usr/bin/'. That would help to align about a good solution. Can you explain the motivation behind that?

To unblock you for the moment, a workaround might be to add user root to group data. Since switch_user_group() seems not to drop supplementary groups, group data should remain.

cstoidner avatar Feb 25 '22 18:02 cstoidner

@hansdaniels What is the status of this issue on your side?

We no responses within 5 days, I will close this issue.

didier-wenzek avatar Jun 09 '22 10:06 didier-wenzek

Hi @didier-wenzek,

did you change anything in this direction? If not, we still have the problem.

@cstoidner: The reason is, because our device has base system (/usr/ belongs to the base system) which should no modified. Customer Apps have to be installed in /mnt/data.

toewsar avatar Jun 09 '22 10:06 toewsar

Your reactivity makes things clear: this is important for you!

didier-wenzek avatar Jun 09 '22 10:06 didier-wenzek

Since this ticket is opened, many things have changed but the issue is still here. As a summary:

  • If the tedge user can run the tedge_mapper command only thanks to having the tedge user added to some group (here the tedge user be added to the data group to be able the access the directory containing the tedge_mapper directory);
  • Then the sudo tedge connect c8y command correctly connects the device to Cumulocity, but doesn't restart the mapper due to a failing check (tedge-mapper being not accessible is erroneously interpreted as tedge-mapper is not installed).
  • The root cause is that the tedge connect changes the effective group id - loosing any access granted via groups to the tedge user.
  • There are several workarounds.
    1. Granting traversal access of /mnt/data/ to all users.
    2. Adding root to the data group (The fact that this works highlights the correctness of one of the comment: "This is to lax because root doesn't drop all capabilities").
    3. Restarting manually the mapper on reconnect.
  • There are some quick but no convincing fixes.
    1. Do the which check as root and not tedge.
    2. Stop to change the group.
  • Long-term fixes are more involved.
    1. Make useless the need to restart the mapper on connect. This is my preferred solution and a work in that direction is on going see https://github.com/thin-edge/thin-edge.io/issues/1201. A follow up task will be to remove the culprit mapper restart call in the tedge command.
    2. Stop playing with effective user and group in the tedge command. As highlighted in this thread, this is not really effective from a user perspective. The main idea was to create resources for different users (tedge, mosquitto, ... and others that have been deprecated meantime). The simpler would be to create these resources with explicit user, group and mod: see https://github.com/thin-edge/thin-edge.io/issues/1207.
    3. Stop to have a tedge command that try to do things from the inside (restarting mosquitto and the mapper) with extra complexity to configure the init system. This would be so simple to script this from the outside. However, this needs to be done after a better understanding of the base commands that thin-edge must provide.

As a conclusion. We will address this issue but with no rush. 1) by removing the necessity to restart the mapper on connect and 2) by removing the dance with effective users and groups.

didier-wenzek avatar Jun 20 '22 17:06 didier-wenzek

@toewsar We have finally implement one of the 3 fixes: the second one labelled ii) The tedge command does no more switch back and forth from root and the tedge user. Your main issue should be fixed. Can you double check that? Thank you.

didier-wenzek avatar Jul 28 '22 14:07 didier-wenzek

@didier-wenzek I check the version 0.7.3 and it worked. Thanks,

toewsar avatar Aug 01 '22 12:08 toewsar