quilt
quilt copied to clipboard
[question] Quilt and min.io?
Hi there. Is it or will it be possible to use quilt with on-premises s3-based solutions like min.io? AWS/cloud may be unavailable in some scenarios.
Thanks!
It is possible and we have min.io on the roadmap. We invite you to try Quilt with a min.io endpoint and file bugs that you encounter, as we have yet to formalize support. In theory the min.io API means that it just works, in practice it's not that simple due to assumptions in the code and/or missing features in min.io.
Glad to hear that. I will definitely try it and let you know the results.
Thanks
One suggestion to encourage more people interested in testing it is to provide a small how-to.
Indeed. Just as a heads up we are still on the bleeding edge here and therefore haven't solicited people to try it, but if you are already using min.io and are willing to try then we welcome those data points.
On Fri, Nov 27, 2020 at 3:08 PM Matheus Mota [email protected] wrote:
One suggestion to encourage more people interested in testing it is to provide a small how-to.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/quiltdata/quilt/issues/1941#issuecomment-734999992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKC5W6YOJS6IOT7UTDONJ3SSAPPTANCNFSM4UFKX5OA .
--
Aneesh Karve, Co-Founder & CTO | 765-360-9348 | LinkedIn http://linkedin.com/in/aneeshkarve | Twitter https://twitter.com/akarve
quiltdata.com | open.quiltdata.com
@matheusmota I know your issue is only from six days ago but have you already given this a shot? Did you by any chance take some notes that you are willing to share if you started on this?
Either way, I will try to set up MinIO on a local node and address it with quilt in the coming days.
waiting for minio, grate jobs
@matheusmota @Midnighter Any updates on Min.IO? Maybe you can share your experience? I am currently considering giving quilt a go, but only have mio.io available
I briefly tried and was not successful. I haven't been able to give it a more serious attempt since then.
@akarve I am trying to establish Quilt as a core component of the data infrastructure at our research org. AWS is a non-starter for us so I am attempting to slowly fill in the AWS-dependent gaps with MinIO compatibility starting with the quilt3 python package - initially as a standalone that does not rely on a registry server. I quickly hacked together a solution that mainly just involves modifying the S3ClientProvider._build_client
method to create a client with endpoint_url specified. Currently I just check an environment variable for the endpoint url and if it exists, create the client with the endpoint url, otherwise the same old way.
quilt3/datatransfer.py
class S3ClientProvider:
...
def _build_client(self, get_config):
session = self.get_boto_session()
endpoint_url = getenv_s3_endpoint_url()
if endpoint_url:
return session.client(
's3',
config=Config(signature_version='s3v4'),
endpoint_url=endpoint_url,
)
return session.client('s3', config=get_config(session))
As far as credentials, I currently edit the CREDENTIALS_PATH
file with MinIO user credentials and it works fine.
Now this is just a starting implementation and far from optimal, but I'm wondering if this standalone MinIO-compatible mode is something that you're interested in supporting in the quilt3 python package and if you have any ideas as far as things to consider in the design.
Thanks!
@marcodlk nice workaround and directionally correct (sorry for the slow reply). what we're planning to do here is in the next-gen client (already in the works and will be open source) to abstract the providers a little bit so that at first any object-compatible store can be interposed (GCP, Azure, MinIO) so that's the long term solution and we don't have code just yet. wanna join our Slack and we can discuss further? thank you.
@marcodlk
With boto3>=1.28.0 you can use AWS_ENDPOINT_URL_S3 to customize endpoint URL. See https://docs.aws.amazon.com/sdkref/latest/guide/feature-ss-endpoints.html.
Hi @marcodlk Can you share the diff of the change you make? It looks like quilt never access the credentials.json file.
@link89 I no longer have access to the codebase I was working on, but looking at the code, quilt3.session._load_credentials
still uses CREDENTIALS_PATH
so that's odd. Are you sure it is the "credentials.json" in the Quilt app directory as specified by BASE_PATH
in quilt3.util
module? Have you tried @sir-sigurd 's solution?
For min.io support we hopefully don't need to touch credentials.json
as that is for the special case where users authenticate to a Quilt stack. But in the more general case quilt3
just falls back onto the boto3 credential chain (and never touches credentials.json) and that is applicable in more cases, especially for pure open source users.
Here is a draft PR that allows users to create their own S3 clients (including min.io clients) and map them to specific buckets. https://github.com/quiltdata/quilt/pull/3765
We'd appreciate any feedback on the interface. This isn't necessarily the best way for Quilt to find and access min.io servers. Please let us know how you think Quilt should map min.io endpoints and bucket names.