metacat icon indicating copy to clipboard operation
metacat copied to clipboard

Metacat doesn't seem to support all of Unicode for PIDs

Open amoeba opened this issue 7 years ago • 0 comments

While trying to be snarky on Slack today I ended up finding some funny behavior. I was trying to use an emoji as a PID and ended up getting an XML validation error from Metacat.

library(dataone)
library(datapack)

mn <- MNode("https://dev.nceas.ucsb.edu/knb/d1/mn/v2")

obj_path <- tempfile()
writeLines(LETTERS, obj_path)

sm <- new("SystemMetadata",
          identifier = "🦋")

createObject(mn, 
             pid = "🦋",
             file = obj_path,
             sysmeta = sm)

R spits out:

xmlParseCharRef: invalid xmlChar value 55358
xmlParseCharRef: invalid xmlChar value 56715

Metacat spits out:

metacat 20180606-16:45:52: [ERROR]: D1ResourceHandler: Serializing exception with code 400: The supplied system metadata is invalid. The identifier ð<9f>¦<8b> does not match identifierin the system metadata identified by 🦋. [edu.ucsb.nceas.metacat.restservice.D1ResourceHandler:serializeException:536]

If I check the System Metadata R generates, the emoji appears to serialize fine so I don't suspect it's a problem with R.

cat(serializeSystemMetadata(sm, "v2"))
<?xml version="1.0"?>
<d1_v2.0:systemMetadata xmlns:d1_v2.0="http://ns.dataone.org/service/types/v2.0" xmlns:d1="http://ns.dataone.org/service/types/v1">
 <serialVersion>1</serialVersion>
 <identifier>🦋</identifier>
...

The next step would be to debug this with curl on the command line to take R out of the equation.

amoeba avatar Jun 07 '18 05:06 amoeba