xoai icon indicating copy to clipboard operation
xoai copied to clipboard

NoSuchElementException when parsing xoai

Open cmacdonald opened this issue 8 years ago • 8 comments

Stacktrace as follows:

Exception in thread "main" java.util.NoSuchElementException
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.throwEndOfInput(Stax2EventReaderImpl.java:453)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:242)
	at com.lyncode.xml.XmlReader.next(XmlReader.java:134)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:43)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parse(MetadataParser.java:34)
	at org.dspace.xoai.serviceprovider.parsers.RecordParser.parse(RecordParser.java:56)
	at org.dspace.xoai.serviceprovider.parsers.ListRecordsParser.next(ListRecordsParser.java:60)
	at org.dspace.xoai.serviceprovider.handler.ListRecordHandler.nextIteration(ListRecordHandler.java:71)
	at org.dspace.xoai.serviceprovider.lazy.ItemIterator.hasNext(ItemIterator.java:32)
	at org.dspace.xoai.serviceprovider.lazy.ItemIterator.<init>(ItemIterator.java:22)
	at org.dspace.xoai.serviceprovider.ServiceProvider.listRecords(ServiceProvider.java:57)

Minimum reproducible:

OAIClient oaiClient = new HttpOAIClient("http://repository.abertay.ac.uk/oai/request");
context.withOAIClient(oaiClient);
ServiceProvider ssoarOaiPmhEndpoint = new ServiceProvider(context);
ListRecordsParameters parameters = new ListRecordsParameters();
parameters.withMetadataPrefix("xoai");
ssoarOaiPmhEndpoint.listRecords(parameters);

Example record at: view-source:http://repository.abertay.ac.uk/oai/request?verb=GetRecord&metadataPrefix=xoai&identifier=oai:repository.abertay.ac.uk:10373/1861

parseElement is failing at: parsing of license. Example is

<element name="license"><field name="bin">Tk9URTogVGhpcyBpcyB0aGUgZGVmYXVsdCBsaWNlbmNlIHRoYXQgdGhlIFVuaXZlcnNpdHkgb2YgQWJlcnRheSAKRHVuZGVlIHJlcXVpcmVzIGFsbCBzdWJtaXR0ZXJzIHRvIGdyYW50LgoKTk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5DRQoKQnkgYWdyZWVpbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbmNlLCB5b3UgKHRoZSBhdXRob3IocyksIApjb3B5cmlnaHQgb3duZXIgb3Igbm9taW5hdGVkIGFnZW50KSBncmFudHMgdG8gVW5pdmVyc2l0eSBvZiBBYmVydGF5IApEdW5kZWUgKFVBRCkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgCihhcyBkZWZpbmVkIGJlbG93KSwgYW5kL29yIGRpc3RyaWJ1dGUgeW91ciBzdWJtaXNzaW9uIChpbmNsdWRpbmcgdGhlIAphYnN0cmFjdCkgd29ybGR3aWRlIGluIHByaW50IGFuZCBlbGVjdHJvbmljIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bSwgCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBVQUQgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4gCllvdSBhbHNvIGFncmVlIHRoYXQgVUFEIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIApzdWJtaXNzaW9uIGZvciBwdXJwb3NlcyBvZiBzZWN1cml0eSwgYmFjay11cCBhbmQgcHJlc2VydmF0aW9uLgoKWW91IHJlcHJlc2VudCB0aGF0IHRoZSBzdWJtaXNzaW9uIGlzIG9yaWdpbmFsIHdvcmssIGFuZCB0aGF0IHlvdQpoYXZlIHRoZSByaWdodCB0byBncmFudCB0aGUgcmlnaHRzIGNvbnRhaW5lZCBpbiB0aGlzIGxpY2VuY2UuIFlvdSAKYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIgCmtub3dsZWRnZSwgaW5mcmluZ2UgdXBvbiBhbnlvbmUncyBjb3B5cmlnaHQuCgpJZiB0aGUgc3VibWlzc2lvbiBjb250YWlucyBtYXRlcmlhbCBmb3Igd2hpY2ggeW91IG9yIHlvdXIgcHVibGlzaGVyCmRvIG5vdCBob2xkIGNvcHlyaWdodCwgeW91IHJlcHJlc2VudCB0aGF0IHlvdSBoYXZlIG9idGFpbmVkIHRoZQp1bnJlc3RyaWN0ZWQgcGVybWlzc2lvbiBvZiB0aGUgY29weXJpZ2h0IG93bmVyIHRvIGdyYW50IFVBRCB0aGUKcmlnaHRzIHJlcXVpcmVkIGJ5IHRoaXMgbGljZW5jZSwgYW5kIHRoYXQgc3VjaCB0aGlyZC1wYXJ0eSBvd25lZAptYXRlcmlhbCBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIGFja25vd2xlZGdlZCB3aXRoaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgc3VibWlzc2lvbi4KCklGIFRIRSBTVUJNSVNTSU9OIElTIEJBU0VEIFVQT04gV09SSyBUSEFUIEhBUyBCRUVOIFNQT05TT1JFRCBPUiAKU1VQUE9SVEVEIEJZIEFOIEFHRU5DWSBPUiBPUkdBTklaQVRJT04gT1RIRVIgVEhBTiBVQUQsIFlPVSBSRVBSRVNFTlQgClRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgClJFUVVJUkVEIEJZIFNVQ0ggQ09OVFJBQ1QgT1IgQUdSRUVNRU5ULgoKVUFEIHdpbGwgY2xlYXJseSBpZGVudGlmeSB5b3VyIG5hbWUocykgYXMgdGhlIGF1dGhvcihzKSBvciBvd25lcihzKSAKb2YgdGhlIHN1Ym1pc3Npb24sIGFuZCB3aWxsIG5vdCBtYWtlIGFueSBhbHRlcmF0aW9uLCBvdGhlciB0aGFuIGFzIAphbGxvd2VkIGJ5IHRoaXMgbGljZW5jZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=</field>
</element>

which contains the mime encoded contents of the license.

v4.2.1-SNAPSHOT cloned from git repo today.

Any ideas?

cmacdonald avatar Jul 31 '17 19:07 cmacdonald

Removing that particular <element> has no impact from a test parsing. This is a xoai created by a Dspace repository. I'm puzzled.

cmacdonald avatar Jul 31 '17 20:07 cmacdonald

Could you create a minimum test case with the failing example (and the XML not retrieved from the site, so the example is stable)?

Could you also confirm if the XML is OAI valid?

mmalmeida avatar Aug 01 '17 08:08 mmalmeida

Minimal test case - inserting inline as GH wont take an XML attachment.

As to its validity, I can confirm its created by a Dspace instance. Do you have an XOAI validator?

Will also update issue title.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
	<responseDate>2017-07-31T20:10:46Z</responseDate>
	<request verb="GetRecord" identifier="oai:repository.somewhere.ac.uk:10373/1861"
		metadataPrefix="xoai">http://repository.somewhere.ac.uk/oai/request</request>
	<GetRecord>
		<record>
			<header>
				<identifier>oai:repository.somewhere.ac.uk:9999/999</identifier>
				<datestamp>2015-02-03T17:41:40Z</datestamp>
				<setSpec>com_10373_3</setSpec>
				<setSpec>col_10373_12</setSpec>
			</header>
			<metadata>
				<metadata xmlns="http://www.lyncode.com/xoai">
					<element name="dc">
						<!--  either this block -->
						<element name="contributor">
							<element name="author">
								<element name="none">
									<field name="value">Author1, First A.</field>
									<field name="value">Author2, Second</field>
									<field name="value">Author3, Third</field>
								</element>
							</element>
						</element>
						<!--  or this following commented block -->
						<!--  
						<element name="relation">
							<element name="ispartof">
								<element name="en">
									<field name="value">Another article 6(4)</field>
								</element>
							</element>
						</element>
						 -->
					</element>
				</metadata>
			</metadata>
		</record>
	</GetRecord>
</OAI-PMH>

cmacdonald avatar Aug 01 '17 12:08 cmacdonald

Asking Google for "OAI validator" turns up quite a few hits. The only one I'm at all familiar with is OVAL: http://oval.base-search.net/

mwoodiupui avatar Aug 01 '17 12:08 mwoodiupui

Thank you for that observation - I should have checked also.

I have now checked with the "offending" endpoint with http://oval.base-search.net/ and http://validator.oaipmh.com/. In particular, the latter produced no error for ListRecords, and the former produced an error about "No incremental harvesting (day granularity) of ListRecords", which I think would be irrelevant.

Output from a third validator can be found at http://oanet.cms.hu-berlin.de/validator/pages/validation_dini_results.xhtml?vid=ZUZaM2FscFM2NEpUY2lncHdZYno2QT09 - I don't feel qualified to ascertain the relevance of any of these to the Exception at hand.

cmacdonald avatar Aug 01 '17 12:08 cmacdonald

I believe this is concerned with more than two levels of nesting <element> tags in the Dspace generated xoai.

cmacdonald avatar Aug 02 '17 21:08 cmacdonald

The problem is related to the underlying XmlReader, which consumes events without checking that they are not what was being requested. After some hacking, the simplest fix I could identify was just to check in the MetadataParser that the EOD had not been reached . If someone else is in agreement, I can add a test case, and make a pull request.

MetadataParser.diff.txt

cmacdonald avatar Aug 03 '17 16:08 cmacdonald

I had a long plane journey, so rewrote the traversal code underlying MetadataParser, which has a number of problems when parsing xoai:

  • elements within elements
  • elements within elements that already have fields.

My revised MetadataParser can be found at https://github.com/cmacdonald/xoai/commit/05f67f26bf8eb3c2eff60a076edc3c7189163c57

I have my own application code that I have with tested examples of OAI from Pure, Dspace and Eprints. I can make unit tests for xoai-serviceprovider.

cmacdonald avatar Aug 06 '17 00:08 cmacdonald