data-repository-service-schemas
data-repository-service-schemas copied to clipboard
Matching access method and scheme of returned access URLs
I was surprised to not being able to find any mention in the DRS specification of constraints on the the scheme of the access URLs returned by a DRS server.
Should there be constraints on the contents of the access_url
property in each AccessMethod
item of a GET /objects/{object_id}
response? For example, should the access_url
in an AccessMethod
with "type": "https"
be required to start start with https://
? Similarly, should the access_url
in an AccessMethod
with "type": "gs"
be required to start with gs://
?
Likewise, should there be similar constraints on the url
property of an AccessUrl
response to GET /objects/{object_id}/access/{access_id}
? For example, if a particular access_id
was taken from an AccessMethod
with "type": "gs"
, should the url
property of the resulting AccessUrl
be required to start with gs://
?
As currently written, the specification makes no such constraints. This could potentially allow returning a file:
URL for an access method s3
. It significantly complicates client implementations, which, I assume, are written with the specific goal to obtain the bytes using a particular protocol. No client I can think of would look for an S3 access method and then dynamically switch to using the local file system to access the bytes.
hello, side note, we have a similar question in re access_url
- at least one client system requires that the access_url
be an HTTP URL.
@hannes-ucsc @ctb I think this is because the OpenAPIv3 spec doesn't provide an easy way to apply data model constraints in the current spec.
OpenAPIv3 only extends JSON-Schema DRAFT-5 spec and hence some of the useful DRAFT-7 keywords (if-then-else) that are unavailable to OpenAPIv3, and hence difficult to model and apply.
With the JSON-Schema DRAFT-07 spec you could apply this contract/constraint using:
"if": {
"properties": { "type": { "const": "s3" } }
},
"then": {
"properties": { "access_url": { "pattern": "^s3:\/\/.*$" } }
},
I admit this will get very convoluted with loads of nested if-then-else statements but allows you have schema independence if required. In the AccessMethod and AccessURL case, I feel this independence is not really required as they are both very much dependant on each other.
So working within the DRAFT-05 and OpenAPIv3 spec a clean way to achieve this would be with the following:
...
"anyOf": [
{ "properties": { "type": { "type": "string", "pattern": "s3" }, "access_url": { "pattern": "^s3:\/\/.*$" }, ...} },
{ "properties": { "type": { "type": "string", "pattern": "gs" }, "access_url": { "pattern": "^gs:\/\/.*$" }, ... } },
...
]
...
Even though const
is part of the DRAFT-05 spec, it is not part of OpenAPIv3 spec, hence the use of a pattern for AccessMethod.type
.
For this to work, both AccessMethod and AccessURL need to be part of the same object.
Hope this helps.
thanks, @susheel. My question: is there a list of AccessMethod and/or URI schemes that must or should be supported for full compatibility? Is there any official guidance on this?
(I'm happy to make this a new issue if you prefer.)
@ctb Good point, and I agree to having a minimum compliance list of AccessMethods would make sense.
One for the maintainers of the standard I'm afraid. I'm just one of the original contributors to the standard. I would suggest splitting this out into a separate issue, as the minimum supported AccessMethods could be provided via the /service-info
endpoint. The main question for the maintainers and community would be what does the minimum set look like - possibly via survey perhaps?
@susheel I don't think we need to necessarily express the constraint in the schema, but the reference documentation should be updated. If people agree that this is a desirable constraint to add, that is. I know one prominent server implementation that currently returns https
URLs for the gs
access method and it really makes my client implementation hacky.
Happy to see this conversation taking place, I agree there should be alignment between the type
and the access_url
's scheme. If these 2 attributes should be matching however, it may indicate that type
is redundant and could be removed.
We can look at this issue at a future Cloud work stream call if there's a PR. Submitting a PR will trigger a docs build with the proposed changes, making it easier for us to review.