cli icon indicating copy to clipboard operation
cli copied to clipboard

AWS Batch setup instructions do not allow access to intermediate artifacts in s3://nextstrain-data/ bucket

Open sacundim opened this issue 2 years ago • 1 comments

The current version of the instructions for setting up AWS Batch instructs people to create three IAM policies, but none of the three grants s3:ListBucket and s3:GetObject access to the s3://nextstrain-data/ bucket that the ncov Open build uses for intermediate GenBank artifacts. This means that people who attempt to run a build on Batch modeled after that one will experience errors like I did in this ticket:

  • https://github.com/nextstrain/ncov/issues/909

For an example IAM policy that grants access to that bucket, see:

  • https://github.com/sacundim/covid-19-puerto-rico-nextstrain/commit/8de83db75ddeb568a365cfdd0f190c3f5bb0c447

sacundim avatar Apr 10 '22 01:04 sacundim

Agreed we should adjust the example policy in those instructions to grant to nextstrain-data and add explanation of why/when its useful, noting that it's technically optional. Not all Batch setups will need it, but we will be extending other core pathogen builds to use a similar input data file pattern so good to include it earlier than later.

Background context here is that the example policy in these instructions long predates the ncov build and its data files on s3://nextstrain-data. The policy also doesn't assume any particular build is being run, but since the ncov build and its input data is so widely-used it'd still be good to add grant/mention now.

tsibley avatar Apr 26 '22 23:04 tsibley