cromwell
cromwell copied to clipboard
AWS S3: Can't access data outside my region (Status Code: 301)
Hi!
I'm having some trouble request s3 objects that are outside my current region (I get a Status Code: 301).
Backend: AWS Batch
Filesystem: S3
Region : ap-southeast-2
I'm attempting to run a small genomics pipeline that is trying to request some of the broad-reference
open data set on AWS S3. I can see that open data set exists in us-east-1
.
Specifically, I'm requesting (s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
) and I'm receiving the same error 5 times.
[2019-03-12 11:27:21,50] [error] WorkflowManagerActor Workflow 434834fb-cb24-4bd2-ba44-8a1c929b11f5 failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anon$1: Workflow input processing failed:
[Attempted 1 time(s)] - S3Exception: null (Service: S3Client; Status Code: 301; Request ID: null)
[Attempted 1 time(s)] - S3Exception: null (Service: S3Client; Status Code: 301; Request ID: null)
[Attempted 1 time(s)] - S3Exception: null (Service: S3Client; Status Code: 301; Request ID: null)
[Attempted 1 time(s)] - S3Exception: null (Service: S3Client; Status Code: 301; Request ID: null)
[Attempted 1 time(s)] - S3Exception: null (Service: S3Client; Status Code: 301; Request ID: null)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.cromwell$engine$workflow$lifecycle$materialization$MaterializeWorkflowDescriptorActor$$workflowInitializationFailed(MaterializeWorkflowDescriptorActor.scala:217)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.scala:187)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor$$anonfun$2.applyOrElse(MaterializeWorkflowDescriptorActor.scala:182)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
at akka.actor.FSM.processEvent(FSM.scala:684)
at akka.actor.FSM.processEvent$(FSM.scala:681)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.akka$actor$LoggingFSM$$super$processEvent(MaterializeWorkflowDescriptorActor.scala:138)
at akka.actor.LoggingFSM.processEvent(FSM.scala:820)
at akka.actor.LoggingFSM.processEvent$(FSM.scala:802)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.processEvent(MaterializeWorkflowDescriptorActor.scala:138)
at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:678)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:672)
at akka.actor.Actor.aroundReceive(Actor.scala:517)
at akka.actor.Actor.aroundReceive$(Actor.scala:515)
at cromwell.engine.workflow.lifecycle.materialization.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:138)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
at akka.actor.ActorCell.invoke(ActorCell.scala:557)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I'm basically using the standard aws configuration file for Cromwell:
include required(classpath("application"))
aws {
application-name = "cromwell"
auths = [{
name = "default"
scheme = "default"
}]
region = "ap-southeast-2"
}
engine { filesystems { s3 { auth = "default" } } }
backend {
default = "AWSBATCH"
providers {
AWSBATCH {
actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
config {
numSubmitAttempts = 3
numCreateDefinitionAttempts = 3
root = "s3://$bucketName/cromwell-execution"
auth = "default"
concurrent-job-limit = 16
default-runtime-attributes {
queueArn = "arn:aws:batch:ap-southeast-2:$arn"
}
filesystems { s3 { auth = "default" } }
}
}
}
}
I've contacted AWS Support, to find out if I could fully (region) qualify the S3 locator (something like these examples: s3://us-east-1.amazonaws.com/broad-references/.../file
.
AWS basically said no, and they directed me towards https://github.com/aws/aws-sdk-java/issues/1366 (their aws-sdk-java) with an enableForceGlobalBucketAccess
option on a AmazonS3Builder
.
I've tried to have a search through Cromwell to work out where this setting could be placed, but I'm a bit lost with project structure and Scala.
Brain dumping what I learned from Emil: This localization code is in the proxy. Probably needs to use a force global flag as in the Java SDK.
Looking forward, I also see an issue w/ call caching to files outside the compute region, as the filesystem copy is not using the force global flag.
Unable to reproduce, if you could post your CWL it'd be appreciated.
Hi, I this is might be a little late, but I am having this issue too when running using Batch. I configured my core environment on my own (without using the CF templates). I have a bucket that is located in us-west-2
and the instance running Cromwell (v59), and the Job Queue are located in us-east-2
. When I run a job, I get the same error that @illusional was getting.