dsub icon indicating copy to clipboard operation
dsub copied to clipboard

Bucket mount names ("Duplicate action name")

Open carbocation opened this issue 6 years ago • 3 comments

I'm writing a script that calls from two (potentially) different gcsfuse-mounted sources. In my testbed, they both happen to be on the same bucket, but in reality, they won't be. So, I tried to --mount the same bucket twice under different names. However, it seems that the naming is related to the bucket, rather than to the alias, so doing this fails. Maybe this is as intended, but it doesn't seem desirable.

2019-04-29 09:24:17.941173: Exception HttpError: <HttpError 400 when requesting https://genomics.googleapis.com/v2alpha1/pipelines:run?alt=json returned "Error: validating pipeline: duplicate action name "mount-ukbb_v2"">
Traceback (most recent call last):
  File "/home/james/anaconda2/bin/dsub", line 11, in <module>
    load_entry_point('dsub==0.3.1', 'console_scripts', 'dsub')()
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/commands/dsub.py", line 956, in main
    dsub_main(prog, argv)
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/commands/dsub.py", line 945, in dsub_main
    launched_job = run_main(args)
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/commands/dsub.py", line 1028, in run_main
    unique_job_id=args.unique_job_id)
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/commands/dsub.py", line 1117, in run
    launched_job = provider.submit_job(job_descriptor, skip)
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/providers/google_v2.py", line 915, in submit_job
    task_id = self._submit_pipeline(request)
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/providers/google_v2.py", line 866, in _submit_pipeline
    self._service.pipelines().run(body=request))
  File "build/bdist.linux-x86_64/egg/retrying.py", line 49, in wrapped_f
  File "build/bdist.linux-x86_64/egg/retrying.py", line 206, in call
  File "build/bdist.linux-x86_64/egg/retrying.py", line 247, in get
  File "build/bdist.linux-x86_64/egg/retrying.py", line 200, in call
  File "build/bdist.linux-x86_64/egg/retrying.py", line 49, in wrapped_f
  File "build/bdist.linux-x86_64/egg/retrying.py", line 206, in call
  File "build/bdist.linux-x86_64/egg/retrying.py", line 247, in get
  File "build/bdist.linux-x86_64/egg/retrying.py", line 200, in call
  File "/home/james/anaconda2/lib/python2.7/site-packages/dsub-0.3.1-py2.7.egg/dsub/providers/google_base.py", line 593, in execute
    raise exception
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://genomics.googleapis.com/v2alpha1/pipelines:run?alt=json returned "Error: validating pipeline: duplicate action name "mount-ukbb_v2"">

carbocation avatar Apr 29 '19 13:04 carbocation

Hi @carbocation !

What is the use case for requesting that the same bucket be mounted twice?

I have concerns that GCSfuse is already a fragile enough solution that having a bucket mounted twice within a single dsub task may be setting yourself up for a bad day.

Is the typical Input and Output File Handling insufficient for your use case?

Thanks.

mbookman avatar Apr 29 '19 17:04 mbookman

I tried to describe the use case in the first post, but I can add more color if I did not convey the use case very well. Basically, this is not a "need," it just seems like something that should be possible and it was surprising, as a user, that it didn't work. If it's not possible, or increases risk, then no worries.

carbocation avatar Apr 29 '19 17:04 carbocation

Got it. I let's leave this open and we will document that buckets should only be mounted once and that people should use --env variables to point to specific locations inside of a mount. Should bucket mounting be the actual solution they need.

Thanks!

mbookman avatar Apr 29 '19 17:04 mbookman