Metacat doesn't seem to support all of Unicode for PIDs
While trying to be snarky on Slack today I ended up finding some funny behavior. I was trying to use an emoji as a PID and ended up getting an XML validation error from Metacat.
library(dataone)
library(datapack)
mn <- MNode("https://dev.nceas.ucsb.edu/knb/d1/mn/v2")
obj_path <- tempfile()
writeLines(LETTERS, obj_path)
sm <- new("SystemMetadata",
identifier = "🦋")
createObject(mn,
pid = "🦋",
file = obj_path,
sysmeta = sm)
R spits out:
xmlParseCharRef: invalid xmlChar value 55358
xmlParseCharRef: invalid xmlChar value 56715
Metacat spits out:
metacat 20180606-16:45:52: [ERROR]: D1ResourceHandler: Serializing exception with code 400: The supplied system metadata is invalid. The identifier ð<9f>¦<8b> does not match identifierin the system metadata identified by 🦋. [edu.ucsb.nceas.metacat.restservice.D1ResourceHandler:serializeException:536]
If I check the System Metadata R generates, the emoji appears to serialize fine so I don't suspect it's a problem with R.
cat(serializeSystemMetadata(sm, "v2"))
<?xml version="1.0"?>
<d1_v2.0:systemMetadata xmlns:d1_v2.0="http://ns.dataone.org/service/types/v2.0" xmlns:d1="http://ns.dataone.org/service/types/v1">
<serialVersion>1</serialVersion>
<identifier>🦋</identifier>
...
The next step would be to debug this with curl on the command line to take R out of the equation.