feast
feast copied to clipboard
Adding IAM role config to online store
Is your feature request related to a problem? Please describe.
s3_staging_location field
I would like to understand why was associated a deployment environment name in this variable name: s3_staging_location
. It's makes some confusion when we try to understand how the things works behind the scenes. Besides, this field name can lead to fake semantics. For exemple, in our context we have the s3_staging_location
with a production location value.
iam_role field
Why the iam_role field is exclusive to the offline store scope? In this way is not possible to reuse the role to the online store scope.
Describe the solution you'd like
offline_store:
type: redshift
region: us-west-2
cluster_id: ******
database: ******
user: ******
s3_staging_location: ******
iam_role: arn:aws:iam::account_id:role/role_1
online_store:
type: dynamodb
region: us-east-1
iam_role: arn:aws:iam::account_id:role/role_1
Describe alternatives you've considered We created a role that was not possible to reuse in the online store context. So, we increased the permissions to the role (role_2) used by the aws-vault:
aws-vault exec profile (that uses role_2) -- pipenv run feast apply
Additional context The additional permission was restricted to one user group, we can't use the same solution for other user groups due security issues.
@fcas,
The s3_staging_location
is named staging
because it is only used temporarily for staging data that is required for the Redshift offline store's functionality, such as doing point-in-time joins. The staging
here does not mean that it cannot be used for production purposes. Hopefully this makes sense.
I'm not quite sure what you mean regarding the iam_role
field. Why exactly do you want an iam_role
for the online store?
Thanks for the explanation! I need the iam_role
just to assign a role with that permissions: https://docs.feast.dev/reference/online-stores/dynamodb#permissions
@fcas are you asking for us to add support for specifying a role for DynamoDB (which is distinct from the offline store)? I think that should be a pretty easy change.
I think we should maybe move away from specifying the role and instead use profiles (like dbt does). cc @tsotnet @achals for opinions.
@woop the role is the same to online and offline store.
However, during run time we have distinct roles being used:
aws-vault exec profile (that uses role_2) -- pipenv run feast apply
# 1 - offline store -> assumes role 1 specified in the feature_store.yaml,
# 2 - online store assumes role 2 specified in the ~/.aws/config
feature_store.yaml
offline_store:
type: redshift
region: us-west-2
cluster_id: ******
database: ******
user: ******
s3_staging_location: ******
iam_role: arn:aws:iam::account_id:role/role_1
online_store:
type: dynamodb
region: us-east-1
~/.aws/config
...
[profile ***]
sso_start_url = ***
sso_region = us-east-1
sso_account_id = ****
sso_role_name = role_2
region = us-east-1
The scenario described above lead us to the following exception:
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the CreateTable operation: User: arn:aws:sts::***:assumed-role/AWSReservedSSO_role_2_***/*** is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-east-1:***:table/driver_features.driver_hourly_stats
We "solved" this exception adding the permissions related to the dynamodb (https://docs.feast.dev/reference/online-stores/dynamodb#permissions) to the role_2. However, this solution does not scale for the whole company due security issues. Before that problem, we tried to use env vars related to redshift and dynamodb: https://github.com/feast-dev/feast/issues/1854.
@fcas Can you explain why you need to scale apply
and createTable
permissions to the whole company? Normally only one team controls apply
through CI. Readers of DynamoDB don't need to create tables, they only need to read features.
@woop sorry I didn't express myself well. Despite of automations we have in terms of integration and continuous delivery, the data science and machine learning engineering teams have around 50 people and they are teams that continue to grow.
In this scenario, both teams must be able to run apply, createTable, etc in the research and development phases.
@felixwang9817 I was wondering and even considering the comments about s3_staging_location
field in the code:
s3_staging_location: StrictStr
""" S3 path for importing & exporting data to Redshift """
Normally, the community uses the prefix or suffix temp
to specify temp stuffs (see: https://github.com/search?q=temp_&type=code vs https://github.com/search?q=staging_&type=code --- of course it's not a systematic review). What do you think about s3_temp_location
vs s3_staging_location
?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.