specs icon indicating copy to clipboard operation
specs copied to clipboard

View template should support type-specific templates

Open pbnjay opened this issue 4 years ago • 5 comments

The current specification requires the view template to only have an {{id}} placeholder. However, for services that support multiple types, the view template could be substantially different for each type. Ideally, the "view" section of the service manifest can include a dictionary mapping supported types to view templates.

Example use case - I want to reconcile bioinformatics identifiers from NCBI:

  • Gene ID 4336 (MOBP) Would have a view URL of https://ncbi.nlm.nih.gov/gene/4336
  • Taxonomy ID 9606 (Homo sapiens) Would have a view URL of https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606

A possible solution for this (which is also backwards-compatible):

  "view": {
     "url": "http://https://ncbi.nlm.nih.gov/search/all/?term={{id}}%5Buid%5D",
     "url_by_type": {
          "/gene": "https://ncbi.nlm.nih.gov/gene/{{id}}",
          "/tax": "https://ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id={{id}}"
     }
  }

It would be great if I could support these (and 20+ other resources from NCBI) in the same reconciliation service to aid with disambiguation.

pbnjay avatar Apr 09 '20 15:04 pbnjay

This would make sense! But I can see a few issues with that:

  • Entities can have multiple types, or no type: what logic would you then use to build the view URL for them?
  • At various other places in the API, we expect that entities are uniquely determined by their id (for instance the preview service, or the data extension service). Would these also be areas that need adapting for your use case? As in, is it possible that a given string can be both a valid Gene ID and a valid Taxonomy ID, therefore referring to two different things?

One way to fit NCBI in the existing reconciliation API would be to prefix your ids with their type: use gene/4336 and taxonomy/9606 as ids. Your reconciliation service could expose an endpoint which would translate these compound ids to the correct ncbi.nlm.nih.gov URL patterns with HTTP redirects. In this way, you are ensuring that the ids returned by your service are unambiguous on their own (and you don't have to control ncbi.nlm.nih.gov).

Another way is to use full URLs as ids (especially if you are already using these as URIs in RDF for instance). You can then use "{{id}}" as view template, although this would rely on the fact that identifiers are not URL-escaped before being inserted in the view template (which is something the specs should definitely settle).

wetneb avatar Apr 09 '20 16:04 wetneb

Entities can have multiple types, or no type: what logic would you then use to build the view URL for them?

Yes, one alternative I had thought about (but forgot to include) was to add the view template url to the type definition instead of the view, which I believe would enable this situation. TBH this is how i'm storing it internally anyway.

At various other places in the API, we expect that entities are uniquely determined by their id (for instance the preview service, or the data extension service). Would these also be areas that need adapting for your use case? As in, is it possible that a given string can be both a valid Gene ID and a valid Taxonomy ID, therefore referring to two different things?

I have no plans to implement the preview service for my application, but I imagine you could make a similar modification by adding it to the type definition. For the extension service, adding an optional type parameter to the request could filter the matched ids without breaking current usage. But yes, for example taxonomy id 10116 is very common (Rat) and gene id 10116 is a human gene.

One way to fit NCBI in the existing reconciliation API would be to prefix your ids with their type: use gene/4336 and taxonomy/9606 as ids. Your reconciliation service could expose an endpoint which would translate these compound ids to the correct ncbi.nlm.nih.gov URL patterns with HTTP redirects. In this way, you are ensuring that the ids returned by your service are unambiguous on their own (and you don't have to control ncbi.nlm.nih.gov).

The challenge with this option is that the vast majority of biomedical experimental data sets will refer to these identifiers without the prefixes, making the search much more cumbersome and expensive. In addition, since publishers and analysis tools would not expect the prefixes they would need stripped again before distribution anyway.

pbnjay avatar Apr 09 '20 18:04 pbnjay

The challenge with this option is that the vast majority of biomedical experimental data sets will refer to these identifiers without the prefixes, making the search much more cumbersome and expensive. In addition, since publishers and analysis tools would not expect the prefixes they would need stripped again before distribution anyway.

As far as OpenRefine is concerned, users don't actually need to manipulate entity ids: they only see entity labels in the UI. If they want to retrieve entity ids from a reconciled column, they can of course do so with the cell.recon.match.id formula, which will return the raw ID from the reconciliation service, but for the Wikidata service they can also use "Add column from reconciled values" (the data extension service) to fetch any property, including the Wikidata identifier itself (even if it does not actually make sense to go hit the API for that - it's just much easy to discover).

So in your case, you could let users retrieve the raw Gene ID or raw Taxonomy ID via the data extension service: they would not even realize that the raw id used in the reconciliation API is prefixed.

wetneb avatar Apr 09 '20 18:04 wetneb

Interesting, ok. Part of my struggle is that there aren't really any reconciliation providers for bioinformatics resources, so I haven't been able to fully use those aspects of OpenRefine yet. Most bench scientists I work with would probably prefer not typing in formulas of course.

I have most of the API implemented in Go, so I'll test out the prefixes with a local link-redirect, see how it goes and report back.

pbnjay avatar Apr 09 '20 19:04 pbnjay

Hi @pbnjay, I thought I would just check back - how is the project going?

wetneb avatar Aug 12 '20 14:08 wetneb