quilt
quilt copied to clipboard
Athena config documentation
Description
TODO
- [ ] Automated tests (pytest-codeblocks)
- [ ] Documentation
- [ ] Python: Run
build.py
for new docstrings
- [ ] Python: Run
Codecov Report
Merging #2981 (c61c56f) into master (393eedb) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #2981 +/- ##
=======================================
Coverage 35.72% 35.72%
=======================================
Files 627 627
Lines 27968 27968
Branches 4078 4078
=======================================
Hits 9992 9992
Misses 16808 16808
Partials 1168 1168
Flag | Coverage Δ | |
---|---|---|
api-python | 90.72% <ø> (ø) |
|
catalog | 8.07% <ø> (ø) |
|
lambda | 87.29% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@akarve Is this a good summary?
# Athena Configuration
## Sample scripts for enabling Athena Queries from Quilt
In order to use Athena Queries from inside Quilt, you will need to:
1. For each Account, configure an:
1. AWS Athena Policy
1. Set CloudFormation -> ManagedUserRoleExtraPolicies to include that Policy
2. Add that AWS Policy to Quilt
3. Add that Quilt Policy to any Quilt Roles that need Athena Queries
2. S3 Output Bucket for Quilt Athena Queries
3. Athena Workgroup to write to that Bucket
2. For each Bucket, create:
1. Manifest Tables from `.quilt/packages`
2. Packages Tables from `.quilt/named_packages`
3. Packages View from Packages Table and Manifest Table
4. Objects View from Manifest Table and Packages View
@kevinemoore @fiskus I believe this incorporates the information from the other two Pull Requests. If not, please let me know.
Ah, the AthenaAccessRole BREAKS user access, because it ONLY has the AthenaQuiltAccess policy.
@akarve Is there a policy which grants the equivalent of the ReadWriteQuiltBucket Role? Or am I thinking about this the wrong way?
@akarve @sir-sigurd @kevinemoore I have verified this works, and had k9
review the Policy. If there's no P0 issues, can we approve this by Tuesday morning PDT so we can share it with customers tomorrow afternoon?
@akarve I believe I have resolved all outstanding issues. Can you approve so we can merge? Or do we need to defer the customer discussion scheduled later today?
Fair enough. Updated to use pprint
On Aug 23, 2022, at 10:27 AM, Sergey Fedoseev @.***> wrote:
@sir-sigurd commented on this pull request.
In docs/advanced-features/athena.md https://github.com/quiltdata/quilt/pull/2981#discussion_r952928358:
+```
- Test Athena Query:
- SELECT * FROM quilt_query.quilt_bio_products_quilt_objects_view
- WHERE substr(logical_key, -5)='.tiff'
- -- extract and query package-level metadata
- AND json_extract_scalar(meta, '$.user_meta.nucmembsegmentationalgorithmversion') LIKE '1.3%'
- AND json_array_contains(json_extract(meta, '$.user_meta.cellindex'), '5');
- WorkGroup quilt-query
#9 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=QUEUED
#8 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=RUNNING
#7 athena_await[7c315a69-19af-4784-9299-bf2b020ad165]=FAILED
- {'State': 'FAILED', 'StateChangeReason': 'com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: NWNXJA7Y7TMPFPD6; S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M=; Proxy: null), S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M= (Path: s3://quilt-bio-products/.quilt/packages)', 'SubmissionDateTime': datetime.datetime(2022, 8, 23, 8, 30, 16, 706000, tzinfo=tzlocal()), 'CompletionDateTime': datetime.datetime(2022, 8, 23, 8, 30, 18, 77000, tzinfo=tzlocal()), 'AthenaError': {'ErrorCategory': 2, 'ErrorType': 1306, 'Retryable': False, 'ErrorMessage': 'com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: NWNXJA7Y7TMPFPD6; S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M=; Proxy: null), S3 Extended Request ID: zql9sBQ7JblYs9es43/xLSCMvh7H1WTFyNL4t7i1IfgATVyU4Hnx1Ya3nuvudMMpqEBgzP6DM2M= (Path: s3://quilt-bio-products/.quilt/packages)'}} Is this supposed to be failed? If it is, I think it worth printing with pprint() https://docs.python.org/3/library/pprint.html#pprint.pprint.
— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#pullrequestreview-1082581961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T45HF6Z6KTRUFURCALV2UCY7ANCNFSM56JWTB6A. You are receiving this because you were assigned.
@akarve @sir-sigurd Okay, I added a section in Admin.nd on how to find and import the "base" Quilt policies, so users can easily create a Source=Quilt role to replace the default Source=Custom roles. With that in place, I feel comfortable removing the "drifty" workaround.
Is that enough to finally approve this PR?
Okay, this time Really think I'm done, done.
Can someone review the screenshots to ensure I'm not leaking any sensitive information? https://github.com/quiltdata/quilt/blob/athena-config/docs/Catalog/Admin.md
Closing in favor of https://github.com/quiltdata/examples/pull/5
Sorry, are you suggesting I remove “PutBucketPublicAccessBlock”? I don’t see “#Update*” any more
On Aug 23, 2022, at 10:17 AM, Sergey Fedoseev @.***> wrote:
@sir-sigurd commented on this pull request.
In docs/advanced-features/athena.ipynb https://github.com/quiltdata/quilt/pull/2981#discussion_r952915673:
- " "s3:CreateBucket",\n",
- " "s3:PutObject",\n",
- " "s3:PutBucketPublicAccessBlock"\n", Doesn't seem to be actually done.
— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#discussion_r952915673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T2PHACRL22XMNG4FZTV2UBRPANCNFSM56JWTB6A. You are receiving this because you were assigned.
Doh! Sorry about that. Pushing now.
Also need to retest with stricter policy…
On Aug 18, 2022, at 12:11 AM, Sergey Fedoseev @.***> wrote:
@sir-sigurd commented on this pull request.
In docs/advanced-features/athena.ipynb https://github.com/quiltdata/quilt/pull/2981#discussion_r948732170:
- "import boto3,json,re,time\n",
- "SESSION = boto3.session.Session()\n",
- "print(SESSION)\n",
- "REGION=SESSION.region_name\n", I don't see these changes. Did you forgot to push?
— Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/pull/2981#discussion_r948732170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE2T32QVLLB5HQIAQWBELVZXO2RANCNFSM56JWTB6A. You are receiving this because you were assigned.