hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

Refget metadata: 'trunc512' vs 'TRUNC512'

Open jb-adams opened this issue 4 years ago • 2 comments

The refget spec currently indicates that in the metadata response, the key for the trunc512 checksum is TRUNC512 (i.e. in all caps). However, current implementations appear to be using trunc512

examples:

  1. CRAM Reference Registry Request https://www.ebi.ac.uk/ena/cram/sequence/3050107579885e1608e6fe50fae3f8d0/metadata Response { "metadata": { "id": "3050107579885e1608e6fe50fae3f8d0", "md5": "3050107579885e1608e6fe50fae3f8d0", "trunc512": null, "length": 7156, "aliases": [] } }

  2. AWS INSDC (which uses the ena-refget-processor) for producing JSON metadata blobs Request https://refget-insdc.jeremy-codes.com/sequence/3332ed720ac7eaa9b3655c06f6b9e196/metadata Response { "metadata": { "length": 5386, "aliases": [], "trunc512": "2085c82d80500a91dd0b8aa9237b0e43f1c07809bd6e6785", "id": "2085c82d80500a91dd0b8aa9237b0e43f1c07809bd6e6785", "md5": "3332ed720ac7eaa9b3655c06f6b9e196" } }

Is this a matter of revising the spec, or implementations and compliance testing?

@andrewyatz @jmarshall @andersleung

jb-adams avatar Jan 12 '21 16:01 jb-adams

I guess md5 should be MD5 too ... but that seems overboard. I'd suggest letting implementations lower-case this as it's more consistent for their own personal payloads ...

andrewyatz avatar Jan 12 '21 16:01 andrewyatz

I would call this an error in the specification. While the prose name of the algorithm is TRUNC512 (in capitals), other than this one all its instances in the spec as JSON literals are written in lowercase:

  • The spec's example metadata response has it in lowercase:

     "trunc512": "D761E7B0EE99B4005DBEB0758F71C258FCDD08F9A665DB79"
    
  • Both the description and example of the service-info response have it in lowercase:

    "algorithms": ["md5", "trunc512"]
    

Thus in particular the description and the example of the metadata response are inconsistent. The service-info enum value is in a different part of the JSON so is not intrinsically related, but it would certainly be appropriate and less confusing for both it and the metadata key value to be spelt and capitalised exactly the same way.

If all known implementations spell it as "trunc512", consistently with the spec's metadata response example, I think we should just fix the description table to agree with them — thus:

@@ -242,7 +242,7 @@ string
 md5 checksum.
 </td></tr>
 <tr markdown="block"><td>
-<code>TRUNC512</code><br/>
+<code>trunc512</code><br/>
 string
 </td><td>
   TRUNC512 checksum, if the server does not support TRUNC512 the value will be <code>null</code>.

jmarshall avatar Jan 12 '21 16:01 jmarshall