kedro icon indicating copy to clipboard operation
kedro copied to clipboard

Update documentation around credentials management

Open noklam opened this issue 2 years ago • 6 comments

Description

Short description of the problem here.

parquet_dataset:
  type: dask.ParquetDataSet
  filepath: "s3://bucket_name/path/to/folder"
  credentials:
    client_kwargs:
      aws_access_key_id: YOUR_KEY
      aws_secret_access_key: "YOUR SECRET"

This is the way we mentioned how to provide credentials in Kedro's doc. However fsspec has update the API for quite a while and if you are using newer version of fsspec, you should use key,secret instead of aws_access_key_id instead.

It could be only affecting s3fs (This is how I bump into error), but potentially affect gcs and more.

Context

The docs on credentials are out of date and mention wrong key names. All doc chapters mentioning credentials should be updated to use the correct keys.

noklam avatar Jun 12 '23 15:06 noklam

Today I was helping @ricardopicon-mck and it was not clear how to use Google Cloud credentials. There are excellent examples of how to set up the catalog.yml:

https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-an-excel-file-from-google-cloud-storage

But how does credentials.yml look in that case?

For the record, this did the trick for me:

gcp_credentials:
  token: gcp_credentials.json

But this only worked with a flat file structure. When having a full-fledged Kedro project with conf/base and conf/local, I had to specify the absolute path:

gcp_credentials:
  token: /Users/juan_cano/Projects/QuantumBlack Labs/tmp/test-credentials/conf/local/gcp_credentials.json

I'm sure there is a better way.

In general, the credentials page is not very useful: https://docs.kedro.org/en/stable/configuration/credentials.html

It places a lot of emphasis in how to load them from code, but I'd consider this "advanced" or "programmatic" usage, which is not how most users experience Kedro.

(see also https://github.com/fsspec/gcsfs/issues/583)

astrojuanlu avatar Sep 20 '23 08:09 astrojuanlu

That's a good point and this page needs a clean up to bring up to the same standards as the recent data catalog updates.

stichbury avatar Sep 20 '23 09:09 stichbury

See this for reference https://github.com/kedro-org/kedro/issues/3164

datajoely avatar Oct 11 '23 16:10 datajoely

We might need to document as well how credentials work during development vs in production, see this response by @noklam to a Prefect user https://linen-slack.kedro.org/t/16019525/hi-another-question-is-there-a-way-to-directly-store-the-con#146bb5db-314d-414f-947a-fd9d64f4d223

astrojuanlu avatar Oct 29 '23 17:10 astrojuanlu

There are more problems with the snippet @noklam shared. This is a setup that worked for me:

# catalog.yml
executive_summary:
  type: text.TextDataset
  filepath: s3://social-summarizer/executive-summary.txt
  versioned: true
  credentials: minio_fsspec

# credentials.yml
minio_fsspec:
  endpoint_url: "http://127.0.0.1:9010"
  key: "minioadmin"
  secret: "minioadmin"

This worked fine. But if I put the endpoint_url, key, secret inside client_kwargs, then I get

DatasetError: Failed while loading data from data set TextDataset(filepath=social-summarizer/executive-summary.txt, protocol=s3, 
version=Version(load=None, save='2023-11-25T10.02.34.586Z')).
AioSession._create_client() got an unexpected keyword argument 'key'

The fact that our dataset code is so contrived doesn't help:

https://github.com/kedro-org/kedro/blob/e8f1bfd72992336ec12591b49a5fa2654217472f/kedro/extras/datasets/text/text_dataset.py#L84-L94

(the "copy paste" problems mentioned in #1778)

For the record, I'm using fsspec==2023.10.0.

astrojuanlu avatar Nov 25 '23 10:11 astrojuanlu

I think we should do this after #3811

astrojuanlu avatar Sep 23 '24 08:09 astrojuanlu