cube icon indicating copy to clipboard operation
cube copied to clipboard

Add support for Google Workload Identity to Cube Store

Open brhelwig opened this issue 3 years ago • 6 comments

Is your feature request related to a problem? Please describe. Cube Store currently supports only using a JSON service account credential for authenticating to Google Cloud Storage. Google recommends against using this method when possible for security hardening purposes.

Describe the solution you'd like Implement Workload identity as described in https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity

Describe alternatives you've considered Alternative is to use the existing hard-coded secrets method.

brhelwig avatar Apr 22 '22 19:04 brhelwig

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

github-actions[bot] avatar Jun 26 '22 02:06 github-actions[bot]

Hi, i looked into it a bit.

it seems that if you dont provide credentials cube.js(js bigquery/storage clients side) work correctly. but regarding cubestore it uses a rust package that doesn't have the option. there is an open PR for adding it.

It uses this in order to handle authentication to gcp, maybe can use it diectly instead of waiting for fix in package or maybe can change package?.

The changes needed:

  1. cubejs - make CUBEJS_DB_BQ_CREDENTIALS optional
  2. cubestore - make CUBESTORE_GCP_CREDENTIALS optional and handle it- either by gcp-auth or if its possible to change packages to one that handles it already?

Other alternative: Publishing another environment variable that makes working with workload identity explicit.

omrihaber avatar Aug 27 '23 13:08 omrihaber

Hello,

I got Workload Identity working with the following changes, would be awesome if anyone with Rust experience could run the tests and make a PR. Workload Identity makes a huge difference to us, but we don't have the resources for Rust development, this is just a hack to get it working with old versions and toolchains.

  1. in cubestore/cubestore/Cargo.toml upgrade from cloud-storage 0.7.0 to the 0.11.1 fork by @dark0dave additionally make sure to lock gcp_auth to 0.7.4 and time to 0.3.17 otherwise it won't compile with the toolchain channel = "nightly-2022-11-03". The original cloud-storage does not seem to be actively maintained.
cloud-storage = { git = "https://github.com/dark0dave/cloud-storage-rs", branch = "master", features = ["global-client", "sync", "native-tls"] }
gcp_auth = { version = "=0.7.4" }
time = { version = "=0.3.17" }
  1. From remotefs/gcs.rs remove ensure_credentials_init to remove the requirement on the environment variable.

  2. To adapt remotefs/gcs.rs to cloud-storage 0.11.1 I updated list_with_metadata and imports.

use cloud_storage::{ListRequest, Object};
...
async fn list_with_metadata(
        &self,
        remote_prefix: String,
    ) -> Result<Vec<RemoteFile>, CubeError> {
        let prefix = self.gcs_path(&remote_prefix);
        let request = ListRequest {
            prefix: Some(prefix.into()),
            ..Default::default()
        };
        let list= Object::list(self.bucket.as_str(), request)
            .await?
            .map_ok(|object_list| object_list.items)
            .try_concat()
            .await?;

        let leading_slash = Regex::new(format!("^{}", self.gcs_path("")).as_str()).unwrap();
        let result: Vec<RemoteFile> = list.into_iter()
            .map(|obj| RemoteFile {
                remote_path: leading_slash.replace(&obj.name, NoExpand("")).to_string(),
                updated: obj.updated.clone(),
                file_size: obj.size,
            })
            .collect::<Vec<RemoteFile>>();

        let mut pages_count = result.len() / 1_000;
        if result.len() % 1_000 > 0 {
            pages_count += 1;
        }
        if pages_count > 100 {
            log::warn!("gcs list returned more than 100 pages: {}", pages_count);
        }
        app_metrics::REMOTE_FS_OPERATION_CORE.add_with_tags(
            pages_count as i64,
            Some(&vec![
                "operation:list".to_string(),
                "driver:gcs".to_string(),
            ]),
        );
        Ok(result)
    }

fix any warnings, run cargo build and hopefully it should work. I tested this on cubestore v0.35.6.

codingchili avatar Apr 09 '24 09:04 codingchili

allow me to update the code snippet to solve some time spending on map_ok error: (just add the correct import of futures)

use cloud_storage::{ListRequest, Object};
use futures::{TryStreamExt, StreamExt};
...
async fn list_with_metadata(
        &self,
        remote_prefix: String,
    ) -> Result<Vec<RemoteFile>, CubeError> {
        let prefix = self.gcs_path(&remote_prefix);
        let request = ListRequest {
            prefix: Some(prefix.into()),
            ..Default::default()
        };
        let list= Object::list(self.bucket.as_str(), request)
            .await?
            .map_ok(|object_list| object_list.items)
            .try_concat()
            .await?;

        let leading_slash = Regex::new(format!("^{}", self.gcs_path("")).as_str()).unwrap();
        let result: Vec<RemoteFile> = list.into_iter()
            .map(|obj| RemoteFile {
                remote_path: leading_slash.replace(&obj.name, NoExpand("")).to_string(),
                updated: obj.updated.clone(),
                file_size: obj.size,
            })
            .collect::<Vec<RemoteFile>>();

        let mut pages_count = result.len() / 1_000;
        if result.len() % 1_000 > 0 {
            pages_count += 1;
        }
        if pages_count > 100 {
            log::warn!("gcs list returned more than 100 pages: {}", pages_count);
        }
        app_metrics::REMOTE_FS_OPERATION_CORE.add_with_tags(
            pages_count as i64,
            Some(&vec![
                "operation:list".to_string(),
                "driver:gcs".to_string(),
            ]),
        );
        Ok(result)
    }

BH-ITAY avatar Apr 10 '24 09:04 BH-ITAY

@codingchili Thanks for contributing this! ❤️

Indeed, it would be nice if anyone could make a PR and bring it over the finish line 😄

igorlukanin avatar Apr 10 '24 14:04 igorlukanin

@igorlukanin Opened a PR

I managed to get this to work on my environment, however, I did not write any additional tests for this. Please also note, it is using a forked github repo as a dependency ( cloud-storage-rs by @dark0dave )

BH-ITAY avatar Apr 15 '24 12:04 BH-ITAY