engine icon indicating copy to clipboard operation
engine copied to clipboard

[Enchancement] Consider a simple interpreter to reduce scope of shell exec

Open erebe opened this issue 3 years ago • 4 comments

After reading the code, there is a lot of files that are doing command wrapping by doing shell exec behind the scene. While doing shell exec is OK as reimplementing everything of what binaries are doing is

  • Tedious and error prone
  • Does not bring much value
  • Risk of breaking with different version

A big chunk of the code is riddled with system exec with its own had hoc way of parsing output, calling convention and doing the same kind of operation.

To centralize, standardize and isolate those system-exec, consider moving from raw Command::new to a simple interpreter. It will have the benefit of :

  • Output is a standardized json
  • Command only need to return a result per line and will be automatically turned into an array of string
  • Every system exec will happen behind a standard code interface
  • System exec will appear only once in the code, inside exec_shell
  • Avoid Command::new to creep more into the code
  • Ease the test of those function, only need to check script output

Please find below an example of working interpreter. It can still be improved (timeout can be added, be safer) but provide enough flexibility IMHO.

// Operations available for our shell support
pub enum Shell<'a> {
    ExecCmd(&'a str, &'a str),
    ExecCmdWithRetry(&'a str, &'a str, u32),
    EnsureCmd(&'a str),
    SetEnv(&'a str, &'a str),
    SetEnvs(&'a Vec<(&'a str, &'a str)>),

}

// Standardized output for all our shell command, a script output is just a Vec<ShellOutput>
#[derive(Deserialize, Debug)]
pub struct ShellOutput {
    name: String,
    command: String,
    error: String,
    values: Vec<String>
}

// Snippet of shell script that exec a command and format it's output to a json file "ret.json"
pub fn cmd_output_to_json(name: &str, cmd: &str) -> String {
    format!(r#"
##################
# {0}
##################
{1} > ret.stdout 2> ret.stderr
cmd_exit_code=$?
echo '{{ "name": "{0}",' >> ret.json
echo '   "command": "{1}",' >> ret.json
echo '   "error": "'$(cat ret.stderr | tr "\"" "'")'",' >> ret.json
echo '   "values": ['$(cat ret.stdout | xargs -n1 -I R echo '"R"' | paste -sd ',')']' >> ret.json
echo '}}' >> ret.json
echo ',' >> ret.json

[[ $cmd_exit_code -eq 0 ]] || exit 1
"#, name, cmd)
}

// Exec a command with nb max retry
pub fn retry(name: &str, cmd: &str, max_retry: u32) -> String {
    format!(r#"
##################
# RETRY {2} - {0}
##################
COUNTER=0
cmd_exit_code=0
while [  $COUNTER -lt {2} ]; do
    let COUNTER=COUNTER+1
    {1} > ret.stdout 2> ret.stderr
    cmd_exit_code=$?
    [[ $cmd_exit_code -eq 0 ]] && break
    sleep 2
done

echo '{{ "name": "{0}",' >> ret.json
echo '   "command": "{1}",' >> ret.json
echo '   "error": "'$(cat ret.stderr | tr "\"" "'")'",' >> ret.json
echo '   "values": ['$(cat ret.stdout | xargs -n1 -I R echo '"R"' | paste -sd ',')']' >> ret.json
echo '}}' >> ret.json
echo ',' >> ret.json

[[ $cmd_exit_code -eq 0 ]] || exit 1
"#, name, cmd, max_retry)
}

// Transform operation/opcode of our shell into real sh script
fn eval_opcode(action: &Shell, builder: &mut String) {
    match action {
        Shell::ExecCmd(name, args)=> {
            builder.push_str(cmd_output_to_json(name, args).as_str());
        },
        Shell::SetEnv(name, value) => {
            builder.push_str(format!("export {}=\"{}\"\n", name, value).as_str());
        }
        Shell::SetEnvs(envs ) => {
            for (name, value) in envs.iter() {
                builder.push_str(format!("export {}=\"{}\"\n", name, value).as_str());
            }
        }
        Shell::ExecCmdWithRetry(name, cmd, max_retry) => {
            builder.push_str(retry(name, cmd, *max_retry).as_str());
        }
        Shell::EnsureCmd(cmd) => {
            eval_opcode(&Shell::ExecCmd(format!("ensure command {}", cmd).as_str(),
                                        format!("command -v {}", cmd).as_str()),
                        builder)
        }
    };
}

// Generate a complete/executable shell script from our Shell Opcode
pub fn generate_shell(actions: &Vec<Shell>) -> String {

    let mut builder = String::with_capacity(1024);
    builder.push_str("#!/bin/sh
echo '[' >> ret.json
trap \"rm ret.stdout; rm ret.stderr; sed -i '$ d' ret.json ; echo ']' >> ret.json\" EXIT
    ");

    for cmd in vec![EnsureCmd("cat"), EnsureCmd("sed"), EnsureCmd("paste")] {
        eval_opcode(&cmd, &mut builder);
    }
    for cmd in actions {
        eval_opcode(&cmd, &mut builder);
    }

    builder
}

// Execute our shell script
// 1. Generate the shell script and write it into a temporary folder
// 2. Execute the script with a Command/system exec
// 3. Use serde to parse standard json and extract its value
pub fn exec_shell<P: FromStr>(actions: &Vec<Shell>) -> Result<P, SimpleError> 
{
    use std::fs::File;
    use tempfile::tempdir;

    // Generate and write our shell script into a tmp dir (cleaned when variable is droped)
    let dir = tempdir()?;
    let mut script_path = dir.path().join("run.sh");
    File::create(&script_path)?
        .write_all(generate_shell(&actions).into_bytes().as_slice());

    // Exec our script
    let exit = Command::new("sh")
        .arg(&script_path)
        .current_dir(dir.path())
        .output()?;

    // Parse ret.json to extract value of the commands
    let return_file = File::open(dir.path().join("ret.json"))?;
    let results: Vec<ShellOutput> = serde_json::from_reader(BufReader::new(return_file))?;

  // Check run success when empty json
  // Can be turned into None/Option to simplify
    if exit.status.success() && results.is_empty() {
        return P::from_str("").or_else(|err| {
            Err(SimpleError{
                kind: SimpleErrorKind::Other,
                message: Some(format!("Cannot parse string to correct format"))
            })
        });
    }

    if !exit.status.success() && results.is_empty() {
        return Err(SimpleError{
            kind: SimpleErrorKind::Command(exit.status),
            message: Some(format!("Error executing command: {:?}", exit.stderr))
        });
    }

    // Standard case
    for result in results.iter().rev() {
        if !result.error.is_empty() {
            return Err(SimpleError{
                kind: SimpleErrorKind::Command(exit.status),
                message: Some(format!("Error executing command: {:?}", result))
            });
        }
    }

    P::from_str(results.last().unwrap().values.join("").as_str()).or_else( |err|
        Err(SimpleError{
            kind: SimpleErrorKind::Other,
            message: Some(format!("Cannot parse string to correct format"))
        })
    )

}

#[cfg(test)]
mod tests {
    use crate::cmd::kubectl::Shell::{ExecCmd, ExecCmdWithRetry, EnsureCmd, SetEnv};
    use crate::cmd::kubectl::{generate_shell, exec_shell};

    #[test]
    fn test_generate_shell() {
        let actions = vec![
            EnsureCmd("kubectl"), EnsureCmd("docker"),
            SetEnv("KUBECONFIG", "/home/erebe/.kube/config"),
            ExecCmd("get pod number of restart", "kubectl get pods -o=custom-columns=:.status.containerStatuses..restartCount --no-headers=true"),
            ExecCmdWithRetry("list lol", "kubectl get lol", 3),
        ];

        //let script = generate_shell(&actions);
        //println!("{}", script);

        let ret = exec_shell::<String>(&actions);
        println!("{:?}", ret);
    }

}

It generates script like below

#!/bin/sh
echo '[' >> ret.json
trap "rm ret.stdout; rm ret.stderr; sed -i '$ d' ret.json ; echo ']' >> ret.json" EXIT

##################
# ensure command cat
##################
command -v cat > ret.stdout 2> ret.stderr
cmd_exit_code=$?
echo '{ "name": "ensure command cat",' >> ret.json
echo '   "command": "command -v cat",' >> ret.json
echo '   "error": "'$(cat ret.stderr | tr "\"" "'")'",' >> ret.json
echo '   "values": ['$(cat ret.stdout | xargs -n1 -I R echo '"R"' | paste -sd ',')']' >> ret.json
echo '}' >> ret.json
echo ',' >> ret.json

[[ $cmd_exit_code -eq 0 ]] || exit 1

...

With ret.json file looking like

[
{ "name": "ensure command cat",
   "command": "command -v cat",
   "error": "",
   "values": ["/usr/bin/cat"]
}
,
{ "name": "ensure command sed",
   "command": "command -v sed",
   "error": "",
   "values": ["/usr/bin/sed"]
}
,
{ "name": "ensure command paste",
   "command": "command -v paste",
   "error": "",
   "values": ["/usr/bin/paste"]
}
,
{ "name": "ensure command kubectl",
   "command": "command -v kubectl",
   "error": "",
   "values": ["/usr/bin/kubectl"]
}
,
{ "name": "ensure command docker",
   "command": "command -v docker",
   "error": "",
   "values": ["/usr/bin/docker"]
}
,
{ "name": "get pod number of restart",
   "command": "kubectl get pods -o=custom-columns=:.status.containerStatuses..restartCount --no-headers=true",
   "error": "",
   "values": ["0","0","0","0","0","1","0","0"]
}
,
{ "name": "list lol",
   "command": "kubectl get lol",
   "error": "error: the server doesn't have a resource type 'lol'",
   "values": []
}
]

As an example, 2 kubectl commands rewrote with this simple interpreter

pub fn kubectl_exec_get_number_of_restart(
    kubernetes_config: &str,
    namespace: &str,
    podname: &str,
    envs: &Vec<(&str, &str)>,
) -> Result<String, SimpleError> {
    let cmd_str = format!("get pods {} -n {} -o=custom-columns=:.status.containerStatuses..restartCount --no-headers=true", podname, namespace);
    let cmd = vec![
        EnsureCmd("kubectl"),
        SetEnv(KUBECONFIG, kubernetes_config),
        SetEnvs(&envs),
        ExecCmd("pod nb of restart", cmd_str.as_str())
    ];

   exec_shell(&cmd)
}


pub fn kubectl_exec_get_external_ingress_hostnamel(
    kubernetes_config: &str,
    namespace: &str,
    selector: &str,
    envs: &Vec<(&str, &str)>,
) -> Result<String, SimpleError>
{
    let cmd_str = format!("get svc -o json -n {} -l {}", namespace, selector);
    let cmd = vec![
        EnsureCmd("kubectl"),
        SetEnv(KUBECONFIG, kubernetes_config),
        SetEnvs(&envs),
        ExecCmd("get external ingress hostname", cmd_str.as_str())
    ];

    exec_shell(&cmd)
}

// Do the same for terraform, docker, etc...

erebe avatar Nov 10 '20 08:11 erebe

Hi @erebe, thanks for the suggestion. Your enhancement proposal looks interesting. From what I see, it could even be an external lib/crate that can be used by similar projects that need to shell out.

One question:

  • How does it solve the risk of breaking on different version?

Terraform, Docker, and Helm are written in Go. What do you think of using an adapter interface written in Go to exposes FFI functions and use them directly in Rust?

evoxmusic avatar Nov 13 '20 08:11 evoxmusic

Hello,

Hi @erebe, thanks for the suggestion. Your enhancement proposal looks interesting. From what I see, it could even be an external lib/crate that can be used by similar projects that need to shell out

In my opinion the way of managing the shell and how the final script behave is pretty specific to the engine, where we want to exit at the first error, and only use a limited subset of the full power of a shell.

It can be done a lib/crate but would be limited to qovery usage I would say.

One question: How does it solve the risk of breaking on a different version?

It was more about doing shell exec VS. reimplementing the api behind the binaries. I don't think all binaries have a stable/public api behind them, so using the CLI would avoid version upgrade to break the engine. As in my opinion there is less chance that binary CLI break their command than their internal API changing without prior notice.

Terraform, Docker, and Helm are written in Go. What do you think of using an adapter interface written in Go to exposes FFI functions and use them directly in Rust?

From https://blog.arranfrance.com/post/cgo-sqip-rust/ and https://users.rust-lang.org/t/can-rust-interface-with-go-libraries/4292, it seems a lot of plumbing for not much added value, especially if we take into account the time to maintain in the long run the build of those forks. I am not even sure if we can bundle in the same binary multiple .so from different go version without conflict ¯\_(ツ)_/¯

In the long run why not, but as performance is not very critical here (most of the time is waiting), I would delay this work IMHO

erebe avatar Nov 14 '20 10:11 erebe

I've only glanced at the proposed code, but maybe one of the following crates would be useful when implementing this:

Then missing functionality (that'd be useful to other users) could be contributed to the chosen crate.

What do you think of using an adapter interface written in Go to exposes FFI functions and use them directly in Rust?

Something like this would be great. I'm pretty worried about using an Infrastructure as Code tool that internally uses the command line interface of other tools (I most likely wouldn't use such a tool for anything important).

I just created issue https://github.com/Qovery/engine/issues/50 to collect relevant information and track progress for this topic.

d4h0 avatar Nov 28 '20 12:11 d4h0

Interesting suggestion @d4h0 , we'll take a look at it, thanks a lot

deimosfr avatar Nov 29 '20 14:11 deimosfr