data-repository-service-schemas icon indicating copy to clipboard operation
data-repository-service-schemas copied to clipboard

Matching access method and scheme of returned access URLs

Open hannes-ucsc opened this issue 3 years ago • 6 comments

I was surprised to not being able to find any mention in the DRS specification of constraints on the the scheme of the access URLs returned by a DRS server.

Should there be constraints on the contents of the access_url property in each AccessMethod item of a GET /objects/{object_id} response? For example, should the access_url in an AccessMethod with "type": "https" be required to start start with https://? Similarly, should the access_url in an AccessMethod with "type": "gs" be required to start with gs://?

Likewise, should there be similar constraints on the url property of an AccessUrl response to GET /objects/{object_id}/access/{access_id}? For example, if a particular access_id was taken from an AccessMethod with "type": "gs", should the url property of the resulting AccessUrl be required to start with gs://?

As currently written, the specification makes no such constraints. This could potentially allow returning a file: URL for an access method s3. It significantly complicates client implementations, which, I assume, are written with the specific goal to obtain the bytes using a particular protocol. No client I can think of would look for an S3 access method and then dynamically switch to using the local file system to access the bytes.

hannes-ucsc avatar Jul 20 '21 17:07 hannes-ucsc

hello, side note, we have a similar question in re access_url - at least one client system requires that the access_url be an HTTP URL.

ctb avatar Oct 22 '21 14:10 ctb

@hannes-ucsc @ctb I think this is because the OpenAPIv3 spec doesn't provide an easy way to apply data model constraints in the current spec.

OpenAPIv3 only extends JSON-Schema DRAFT-5 spec and hence some of the useful DRAFT-7 keywords (if-then-else) that are unavailable to OpenAPIv3, and hence difficult to model and apply.

With the JSON-Schema DRAFT-07 spec you could apply this contract/constraint using:

"if": {
  "properties": { "type": { "const": "s3" } }
},
"then": {
  "properties": { "access_url": { "pattern": "^s3:\/\/.*$" } }
},

I admit this will get very convoluted with loads of nested if-then-else statements but allows you have schema independence if required. In the AccessMethod and AccessURL case, I feel this independence is not really required as they are both very much dependant on each other.

So working within the DRAFT-05 and OpenAPIv3 spec a clean way to achieve this would be with the following:

...
 "anyOf": [
    { "properties": { "type": { "type": "string", "pattern": "s3" }, "access_url": { "pattern": "^s3:\/\/.*$" }, ...} },
    { "properties": { "type": { "type": "string", "pattern": "gs" }, "access_url": { "pattern": "^gs:\/\/.*$" }, ... } },
   ...
  ]
...

Even though const is part of the DRAFT-05 spec, it is not part of OpenAPIv3 spec, hence the use of a pattern for AccessMethod.type.

For this to work, both AccessMethod and AccessURL need to be part of the same object.

Hope this helps.

susheel avatar Oct 22 '21 16:10 susheel

thanks, @susheel. My question: is there a list of AccessMethod and/or URI schemes that must or should be supported for full compatibility? Is there any official guidance on this?

(I'm happy to make this a new issue if you prefer.)

ctb avatar Oct 22 '21 16:10 ctb

@ctb Good point, and I agree to having a minimum compliance list of AccessMethods would make sense.

One for the maintainers of the standard I'm afraid. I'm just one of the original contributors to the standard. I would suggest splitting this out into a separate issue, as the minimum supported AccessMethods could be provided via the /service-info endpoint. The main question for the maintainers and community would be what does the minimum set look like - possibly via survey perhaps?

susheel avatar Oct 23 '21 10:10 susheel

@susheel I don't think we need to necessarily express the constraint in the schema, but the reference documentation should be updated. If people agree that this is a desirable constraint to add, that is. I know one prominent server implementation that currently returns https URLs for the gs access method and it really makes my client implementation hacky.

hannes-ucsc avatar Oct 26 '21 20:10 hannes-ucsc

Happy to see this conversation taking place, I agree there should be alignment between the type and the access_url's scheme. If these 2 attributes should be matching however, it may indicate that type is redundant and could be removed.

We can look at this issue at a future Cloud work stream call if there's a PR. Submitting a PR will trigger a docs build with the proposed changes, making it easier for us to review.

jb-adams avatar Oct 27 '21 12:10 jb-adams