Enable multiple cache, job_work dirs per object-store
We would like to have the possibility to specify multiple cache dirs and job_work dirs per object stores.
With the support of weight we could do some round-robin over those dirs and suspend/replace job_work dirs more easily. Having multiple cache dirs has also a few advantages imho.
The downside I think is that Galaxy needs to "search" for the correct dir for each job, so the worst case is probably that Galaxy needs to do a os.path.exists() per job per X cache/job_work dir. But this could be cached maybe in python?
type: generic_s3
auth:
access_key: ...
secret_key: ...
bucket:
name: unique_bucket_name_all_lowercase
use_reduced_redundancy: false
max_chunk_size: 250
connection:
host: swift.example.org
port: 6000
conn_path: /
multipart: true
cache:
- path: database/object_store_cache_01
size: 1000
cache_updated_data: true
weight: 2
- path: database/object_store_cache_02
size: 1000
cache_updated_data: true
weight: 1
extra_dirs:
- type: job_work
path: database/job_working_directory_01
weight: 2
- type: job_work
path: database/job_working_directory_02
weight: 1
We store object_store_id on the job table for the distributed object store, so if the extra dirs were implemented as distributed backends with their own IDs then it wouldn't need to search. And there should be no performance penalty for the default case anyway since there would be no need to search when there is only one dir of that type.
I assume this is triggered by some limitation you've run into, can you tell us what that is ?
There are multiple reasons. Space, performance and flexibility. In our setup we run most of the jobs on one system. We had and have a few boxes that are serving as JWD and now we also have those serving as cache for the S3. So space is one concern, we do like to distribute the jobs over multiple boxes to increase the space available. This is true for JWD and we assume this will be true for the S3 cache. Performance is another problem, we do have a few nasty tools and in conjunction with many nodes we have made good experience with separating the load to multiple shares. So we have multiple physical boxes, with separate network cards etc and we used to do round-robin via multiple backends (using weights for the same physical hard drive). However, the configuration was a bit hacky and it would be easier if we could do the configuration on the jwd/cache setting and not the entire object-stores. This setup also enabled us to retire/maintain JWDs by simple changing the weights, which gave us some nice extra flexibility. In addition, I don't see how I can configure an S3 cache that spans over multiple mounts.
Galaxy Australia also has a use case for this. It would be useful to put the galaxy job working directories for jobs that run on pulsar in a different location from those for jobs that run locally, without using separate object store backends. I have tried overriding job_working_directory for individual jobs using TPV but this doesn't work because jwd from the object store backend is always used.