metacatui
metacatui copied to clipboard
Invalid Characters Allowed in Metadata Saved by Metacat UI Editor cause catastrophic dataset error
Description The Metacat UI Editor allowed invalid characters to be saved in metadata. When the Metacat indexer tried to process the metadata file, the following error was encountered:
metacat-index 20240630-23:50:14: [ERROR]: SolrIndex.update - could not update the solr index for the object ess-dive-3619bd077a60b7c-20240624T120319367 since Invalid byte 2 of 4-byte UTF-8 sequence. [edu.ucsb.nceas.metacat.index.SolrIndex:update:656]
org.apache.solr.client.solrj.SolrServerException: Invalid byte 2 of 4-byte UTF-8 sequence.
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:237) ~[classes/:?]
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:396) ~[classes/:?]
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:697) ~[classes/:?]
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:620) [classes/:?]
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener$1.run(SystemMetadataEventListener.java:187) [classes/:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_402]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_402]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_402]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_402]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402].
The result was that the dataset metadata was not indexed in Solr. However, the resource map was created successfully, rendering the dataset uneditable. The metadata in Solr looked as follows:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"id:ess-dive-3619bd077a60b7c-20240624T120319367",
"wt":"javabin",
"version":"2"}},
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"read_count_i":44,
"id":"ess-dive-3619bd077a60b7c-20240624T120319367",
"identifier":"ess-dive-3619bd077a60b7c-20240624T120319367",
"sku":"ess-dive-3619bd077a60b7c-20240624T120319367",
"_version_":1803150904808964096,
"serviceCoupling":"false",
"isService":false,
"isDocumentedBy":["ess-dive-3619bd077a60b7c-20240624T120319367"],
"documents":["ess-dive-9725a595229ffc6-20240520T181650760",
"ess-dive-a947e57390f1fad-20240613T203820095",
"ess-dive-babae844b274bf2-20240613T212651812",
"ess-dive-f718cd02247b6b7-20240520T181650806",
"ess-dive-03a811f10de6c4a-20240613T204125926",
"ess-dive-3619bd077a60b7c-20240624T120319367",
"ess-dive-6c73eb2d4ac33cb-20240624T115801116",
"ess-dive-e87e6b2bb4d0b0d-20240624T115801104",
"ess-dive-8775aeed8499ba7-20240613T203820082",
"ess-dive-a2b05a328913511-20240613T203820068",
"ess-dive-645a4c9d54aacec-20240624T115754244",
"ess-dive-8641172de4e1937-20240613T210540301",
"ess-dive-3ac7448d1be1e0f-20240613T210540311",
"ess-dive-a29fa7c825dea22-20240613T203820108",
"ess-dive-aac74b2ca73dbee-20240613T203820102",
"ess-dive-323f59eaa468ca0-20240520T181650795",
"ess-dive-047dc22f57f82d8-20240624T115801110",
"ess-dive-f9fd47d9e4c8c34-20240613T203820077",
"ess-dive-cf5ba5193c8d2ef-20240621T121606390",
"ess-dive-8742ead85f7c535-20240613T203820088",
"ess-dive-0d69c0b5a6f7e45-20240613T203820055",
"ess-dive-35eccae477fcaaa-20240613T203820115",
"ess-dive-d3ccee76444e6d9-20240624T115801123"],
"resourceMap":["ess-dive-2c4cdf7a877c0f4-20240624T120319346"],
"language":""}]
}
}
Steps to Reproduce
- Use Metacat UI Editor to save metadata with invalid characters.
- Attempt to index the metadata with Metacat indexer.
- Observe the error in the logs as shown above.
Expected behavior The metadata should be properly encoded as UTF-8 before being saved, ensuring that it can be indexed without errors.
Screenshots
Additional context We recovered from this by using the API directly to upload a new metadata file that is parseable by the Metacat indexer and then manually create the resource map. This fixed the issue enough to allow the dataset to be edited and published. However, the previous version is in a state where it will never be properly indexed. The Metacat UI metadata editor should ensure that the metadata is encoded properly as UTF-8.