metacat icon indicating copy to clipboard operation
metacat copied to clipboard

Metacat-index does not handle <references>

Open mbjones opened this issue 7 years ago • 7 comments


Author Name: ben leinfelder (ben leinfelder) Original Redmine Issue: 6040, https://projects.ecoinformatics.org/ecoinfo/issues/6040 Original Date: 2013-07-25 Original Assignee: Jing Tao


I indexed a document from EVOS that uses a reference for a creator rather than the details of the person:

<creator><references>1359152217358</references></creator>

But in the index it shows up as "||" instead of following the reference back the the id where it was declared:

<associatedParty id="1359152217358">...

http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default

mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-07-26T00:12:44Z


Here is a bit of the bean definition used by indexing to pick out the content from EML

<bean id="eml.origin" class="org.dataone.cn.indexer.parser.CommonRootSolrField"
		p:multivalue="true"
		p:root-ref="originRoot">
		<constructor-arg name="name" value="origin" />
	</bean>
	
	<bean id="originRoot" class="org.dataone.cn.indexer.parser.utility.RootElement"
		p:name="origin"
		p:xPath="//dataset/creator" 
		p:template="[individualName]||[organizationName]">
		<property name="leafs"><list><ref bean="organizationNameLeaf"/></list></property>
		<property name="subRoots"><list><ref bean="individualNameRoot" /></list></property>
	</bean>


mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-03T18:07:41Z


Apparently this is fixed in cn-index-processor v1.2.0 -- so we will need to pull in this newer dependency in metacat-index and adjust the code accordingly.

mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-03T19:01:18Z


This is included in the 1.2.0 d1 index release. It will not include || but instead will use blanks. Not a very great "solution" but better.

mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: Matt Jones (Matt Jones) Original Date: 2013-10-03T19:17:13Z


Spaces aren't really sufficient as a solution, and there are a lot of references fields in EML. We probably need to contribute a fix for this if Skye is not going to fix it for DataONE.

mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: Jing Tao (Jing Tao) Original Date: 2013-10-10T16:36:15Z


Skye said that the sax parser is used to parse those information. This change may require to use DOM parser. It is a big change.

mbjones avatar Mar 05 '18 16:03 mbjones


Original Redmine Comment Author Name: ben leinfelder (ben leinfelder) Original Date: 2013-10-10T16:39:54Z


Even with a SAX parser, the implementation could keep track of all elements with "id" attributes and anytime a "references" element is encountered, substitute with that node. The tricky part would be when we encounter a references element before the actual element that declares the id -- would have to track the references that are unfulfilled and fill them in when we actually get to the id elements.

mbjones avatar Mar 05 '18 16:03 mbjones

I put a twin issue for this over on https://github.com/DataONEorg/d1_cn_index_processor/issues/14 too to track changes across the repos.

amoeba avatar Mar 05 '21 02:03 amoeba

Create a new ticket in the Dataone-indexer repository. Close this one.

taojing2002 avatar Jun 25 '25 22:06 taojing2002