quilt icon indicating copy to clipboard operation
quilt copied to clipboard

Athena config documentation

Open drernie opened this issue 1 year ago • 2 comments

Description

TODO

  • [ ] Automated tests (pytest-codeblocks)
  • [ ] Documentation
    • [ ] Python: Run build.py for new docstrings

drernie avatar Aug 11 '22 22:08 drernie

Codecov Report

Merging #2981 (c61c56f) into master (393eedb) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2981   +/-   ##
=======================================
  Coverage   35.72%   35.72%           
=======================================
  Files         627      627           
  Lines       27968    27968           
  Branches     4078     4078           
=======================================
  Hits         9992     9992           
  Misses      16808    16808           
  Partials     1168     1168           
Flag Coverage Δ
api-python 90.72% <ø> (ø)
catalog 8.07% <ø> (ø)
lambda 87.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Aug 11 '22 22:08 codecov[bot]

@akarve Is this a good summary?

# Athena Configuration
## Sample scripts for enabling Athena Queries from Quilt

In order to use Athena Queries from inside Quilt, you will need to:

1. For each Account, configure an:
   1. AWS Athena Policy
       1. Set CloudFormation -> ManagedUserRoleExtraPolicies to include that Policy
       2. Add that AWS Policy to Quilt
       3. Add that Quilt Policy to any Quilt Roles that need Athena Queries
   2. S3 Output Bucket for Quilt Athena Queries
   3. Athena Workgroup to write to that Bucket

2. For each Bucket, create:
   1. Manifest Tables from `.quilt/packages`
   2. Packages Tables from `.quilt/named_packages`
   3. Packages View from Packages Table and Manifest Table
   4. Objects View from Manifest Table and Packages View

drernie avatar Aug 11 '22 22:08 drernie

@kevinemoore @fiskus I believe this incorporates the information from the other two Pull Requests. If not, please let me know.

drernie avatar Aug 16 '22 01:08 drernie

Ah, the AthenaAccessRole BREAKS user access, because it ONLY has the AthenaQuiltAccess policy.

@akarve Is there a policy which grants the equivalent of the ReadWriteQuiltBucket Role? Or am I thinking about this the wrong way?

drernie avatar Aug 18 '22 15:08 drernie

@akarve @sir-sigurd @kevinemoore I have verified this works, and had k9 review the Policy. If there's no P0 issues, can we approve this by Tuesday morning PDT so we can share it with customers tomorrow afternoon?

drernie avatar Aug 23 '22 05:08 drernie

@akarve I believe I have resolved all outstanding issues. Can you approve so we can merge? Or do we need to defer the customer discussion scheduled later today?

drernie avatar Aug 23 '22 16:08 drernie

Fair enough. Updated to use pprint

On Aug 23, 2022, at 10:27 AM, Sergey Fedoseev @.***> wrote:

@sir-sigurd commented on this pull request.

In docs/advanced-features/athena.md https://github.com/quiltdata/quilt/pull/2981#discussion_r952928358:

+```

  • Test Athena Query:
  • SELECT * FROM quilt_query.quilt_bio_products_quilt_objects_view
  • WHERE substr(logical_key, -5)='.tiff'
  • -- extract and query package-level metadata
  • AND json_extract_scalar(meta, '$.user_meta.nucmembsegmentationalgorithmversion') LIKE '1.3%'
  • AND json_array_contains(json_extract(meta, '$.user_meta.cellindex'), '5');
  • WorkGroup quilt-query
  • #9 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=QUEUED
    
  • #8 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=RUNNING
    
  • #7 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=FAILED
    
  • {'State': 'FAILED', 'StateChangeReason': 'com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: NWNXJA7Y7TMPFPD6; S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M=; Proxy: null), S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M= (Path: s3://quilt-bio-products/.quilt/packages)', 'SubmissionDateTime': datetime.datetime(2022, 8, 23, 8, 30, 16, 706000, tzinfo=tzlocal()), 'CompletionDateTime': datetime.datetime(2022, 8, 23, 8, 30, 18, 77000, tzinfo=tzlocal()), 'AthenaError': {'ErrorCategory': 2, 'ErrorType': 1306, 'Retryable': False, 'ErrorMessage': 'com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: NWNXJA7Y7TMPFPD6; S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M=; Proxy: null), S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M= (Path: s3://quilt-bio-products/.quilt/packages)'}} Is this supposed to be failed? If it is, I think it worth printing with pprint() https://docs.python.org/3/library/pprint.html#pprint.pprint.

— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#pullrequestreview-1082581961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T45HF6Z6KTRUFURCALV2UCY7ANCNFSM56JWTB6A. You are receiving this because you were assigned.

drernie avatar Aug 23 '22 18:08 drernie

@akarve @sir-sigurd Okay, I added a section in Admin.nd on how to find and import the "base" Quilt policies, so users can easily create a Source=Quilt role to replace the default Source=Custom roles. With that in place, I feel comfortable removing the "drifty" workaround.

Is that enough to finally approve this PR?

drernie avatar Aug 26 '22 00:08 drernie

Okay, this time Really think I'm done, done.

Can someone review the screenshots to ensure I'm not leaking any sensitive information? https://github.com/quiltdata/quilt/blob/athena-config/docs/Catalog/Admin.md

drernie avatar Aug 26 '22 06:08 drernie

Closing in favor of https://github.com/quiltdata/examples/pull/5

drernie avatar Sep 20 '22 00:09 drernie

Sorry, are you suggesting I remove “PutBucketPublicAccessBlock”? I don’t see “#Update*” any more

On Aug 23, 2022, at 10:17 AM, Sergey Fedoseev @.***> wrote:

@sir-sigurd commented on this pull request.

In docs/advanced-features/athena.ipynb https://github.com/quiltdata/quilt/pull/2981#discussion_r952915673:

  • " "s3:CreateBucket",\n",
  • " "s3:PutObject",\n",
  • " "s3:PutBucketPublicAccessBlock"\n", Doesn't seem to be actually done.

— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#discussion_r952915673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T2PHACRL22XMNG4FZTV2UBRPANCNFSM56JWTB6A. You are receiving this because you were assigned.

drernie avatar Oct 11 '22 08:10 drernie

Doh! Sorry about that. Pushing now.

Also need to retest with stricter policy…

On Aug 18, 2022, at 12:11 AM, Sergey Fedoseev @.***> wrote:

@sir-sigurd commented on this pull request.

In docs/advanced-features/athena.ipynb https://github.com/quiltdata/quilt/pull/2981#discussion_r948732170:

  • "import boto3,json,re,time\n",
  • "SESSION = boto3.session.Session()\n",
  • "print(SESSION)\n",
  • "REGION=SESSION.region_name\n", I don't see these changes. Did you forgot to push?

— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#discussion_r948732170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T32QVLLB5HQIAQWBELVZXO2RANCNFSM56JWTB6A. You are receiving this because you were assigned.

drernie avatar Oct 11 '22 08:10 drernie