ews-java-api icon indicating copy to clipboard operation
ews-java-api copied to clipboard

Invalid character reference 

Open imario42 opened this issue 9 years ago • 23 comments

Not exactly an issue with the EWS Library, but the XML Parser complains about the reference not being a correct character reference in one of the mails I had to read.

Which I worked around with the following patch. I perfectly understand if you don't want to add this to the library by default as this will silently parse any invalid XML. But probably a way could be added to instrument the stream and or XmlEventReader so we can setup behavior from outside of your library?

@@ -99,7 +101,14 @@ public class EwsXmlReader { XMLInputFactory inputFactory = XMLInputFactory.newInstance(); inputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, false);

  • return inputFactory.createXMLEventReader(stream);
  • XMLEventReader reader = inputFactory.createXMLEventReader(stream);
  • // IM: continue after fatal error to prevent "invalid character reference"
  • XMLErrorReporter errorReporter =
  •    (XMLErrorReporter) reader.getProperty(Constants.XERCES_PROPERTY_PREFIX + Constants.ERROR_REPORTER_PROPERTY);
    
  • errorReporter.setFeature(Constants.XERCES_FEATURE_PREFIX + Constants.CONTINUE_AFTER_FATAL_ERROR_FEATURE, true);
  • return reader; }

imario42 avatar Jun 16 '15 13:06 imario42

Not 100% sure if this is the same issue, but I've run into a case where the exchange server will send an XML 1.0 preamble with entities that are only valid in XML 1.1, specifically unicode control characters. Fixing the preamble resolves all the parse errors without having to ignore them.

I have patch for this but it currently introduces another dependency that probably isn't necessary.

easel avatar Aug 17 '15 17:08 easel

Thanks @easel will you provide a PR for this?

serious6 avatar Aug 17 '15 19:08 serious6

@serious6 I've pushed what I've got so far at https://github.com/OfficeDev/ews-java-api/pull/409

I think it should be possible to remove the dependency on stream flyer, and also to create a failing test case, but I haven't completed either of those successfully yet.

Also, do you think this functionality should be switchable? I've been running in production with it on for over a year with no issues, but it's hard to say that there will be no side effects elsewhere.

easel avatar Aug 17 '15 20:08 easel

I am also facing the same issue while calling item.load()

Exception in thread "main" microsoft.exchange.webservices.data.core.exception.service.remote.ServiceRequestException: The request failed. ParseError at [row,col]:[9,2543] Message: Character reference "&# at microsoft.exchange.webservices.data.core.request.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:74) at microsoft.exchange.webservices.data.core.request.MultiResponseServiceRequest.execute(MultiResponseServiceRequest.java:158) at microsoft.exchange.webservices.data.core.ExchangeService.internalLoadPropertiesForItems(ExchangeService.java:1324) at microsoft.exchange.webservices.data.core.service.item.Item.internalLoad(Item.java:193) at microsoft.exchange.webservices.data.core.service.ServiceObject.load(ServiceObject.java:384) at cots.sg.test.EWSBBEmailServices.readAndParseEmails(EWSBBEmailServices.java:163) at cots.sg.test.EWSBBEmailServices.main(EWSBBEmailServices.java:219) Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[9,2543] Message: Character reference "&#

sg-garg avatar Jun 24 '16 12:06 sg-garg

Have the same problem calling item.load()

microsoft.exchange.webservices.data.core.exception.service.remote.ServiceRequestException: The request failed. ParseError at [row,col]:[9,6] Message: Zeichenreferenz "&# at microsoft.exchange.webservices.data.core.request.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:74) at microsoft.exchange.webservices.data.core.request.MultiResponseServiceRequest.execute(MultiResponseServiceRequest.java:158) at microsoft.exchange.webservices.data.core.ExchangeService.internalLoadPropertiesForItems(ExchangeService.java:1324) at microsoft.exchange.webservices.data.core.service.item.Item.internalLoad(Item.java:193) at microsoft.exchange.webservices.data.core.service.ServiceObject.load(ServiceObject.java:384) at microsoft.exchange.webservices.data.core.service.ServiceObject$load.call(Unknown Source)

costerutilo avatar Sep 01 '16 12:09 costerutilo

I don't understand how the EWS api is useable at all with this bug open.

blackfrancis avatar Apr 13 '17 02:04 blackfrancis

+1 on the pull request to make it into the next release. I'm getting this one: com.sun.org.apache.xerces.internal.xni.XNIException: Character reference "&#

Debugging it, it looks like the entity parsed is  which is coming out of this fragment:

<t:InternetMessageHeader HeaderName="Thread-Topic">&#x1;	Recall: Theorem / Comprehend Training - Session 2 - Tuesday, January 21, 2014</t:InternetMessageHeader>

The subject of this same message is this:

<t:Subject>Recall: Theorem / Comprehend Training - Session 2 - Tuesday, January 21, 2014</t:Subject>

How did a &#x1; make it into this?

beders avatar Apr 26 '17 01:04 beders

@beders the solution from @ben-thompson-ravn worked perfectly for me. How this critical fix hasn't made it into a release despite being available for 6+ months is beyond me.

blackfrancis avatar May 09 '17 00:05 blackfrancis

any fix provided for this invalid xml character issue?

hariregula avatar Mar 13 '18 19:03 hariregula

any update here?

celloni avatar Jun 12 '18 11:06 celloni

Hi @celloni, did you already tried this: https://github.com/OfficeDev/ews-java-api/pull/409

Jan

OS-JaR avatar Jun 12 '18 12:06 OS-JaR

Hey @OS-JaR Thanks for your answer, after a few tests the fix from #409 seems to work.

celloni avatar Jun 12 '18 14:06 celloni

Hi @celloni, if this fix doesn't work, you can try to use InvalidXmlCharacterModifier or write a custom Modifier like public class ExtendedInvalidXmlCharacterModifier implements Modifier to replace invalid chars or even something like a bad-word-filter:

    @Override
    public AfterModification modify(StringBuilder characterBuffer, int firstModifiableCharacterInBuffer, boolean endOfStreamHit){
	
		matcherInvalidChar.reset(characterBuffer);
		matcherInvalidChar.region(firstModifiableCharacterInBuffer, characterBuffer.length());
		
		int start = firstModifiableCharacterInBuffer;
		while (matcherInvalidChar.find(start)){
			start = onMatch(characterBuffer);
		}

		return factory.skipEntireBuffer(characterBuffer, firstModifiableCharacterInBuffer, endOfStreamHit);
    } 

and

    protected int onMatch(StringBuilder characterBuffer)
    {
        characterBuffer.replace(matcherInvalidChar.start(), matcherInvalidChar.end(), "HERE IS A REPLACED BAD WORD");
        return matcherInvalidChar.start() + replacement.length();
    }

with
this.matcherInvalidChar = Pattern.compile("really bad word").matcher(""); //This is pseudo code, don't know if it works like charm

Jan

OS-JaR avatar Jun 12 '18 14:06 OS-JaR

Thanks for your help @OS-JaR ! 👍

celloni avatar Jun 14 '18 11:06 celloni

I successfully parsed mail with first screenshot characters in subject & body but failed to parse with seconf screenshot characters in subject. Any suggestion or help?. @OS-JaR @serious6 @easel

Screenshot from 2020-05-13 15-55-32 Screenshot from 2020-05-13 15-55-45

girishghoda57 avatar May 13 '20 10:05 girishghoda57

Yeah, don't use this library. I think MSFT has a new library available for MS Graph API

beders avatar May 13 '20 17:05 beders

@beders MS Graph API for Java is available here: https://github.com/microsoftgraph/msgraph-sdk-java That would be ok for those who are a) starting a new app and b) using Office 365. It will not work with older versions of Exchange Server.

avromf avatar May 13 '20 17:05 avromf

Graph API doesn't support Exchange on-premises, only Office 365 and Hybrid (Exchange Server 2016).

pkropachev avatar May 13 '20 18:05 pkropachev

Yup, MSFT wants you to move to the cloud ASAP. Looks like this very irritating bug still hasn't been fixed. (I had to use my own fork to make it work. I still do)

beders avatar May 13 '20 18:05 beders

What is the solution for those users who still working ews-java-api?

girishghoda57 avatar May 21 '20 13:05 girishghoda57

Keep in mind that the API for Office 365 won't support Basic Authentication for EWS to access Exchange Online after October 13th, 2020. It will still work tough for on-premises installations.

More details: https://techcommunity.microsoft.com/t5/exchange-team-blog/upcoming-changes-to-exchange-web-services-ews-api-for-office-365/ba-p/608055

celloni avatar May 23 '20 07:05 celloni

In fact, it’s just the question of authentication. You can add support of OAuth 2.0 in EWS Java API and continue using EWS.

pkropachev avatar May 25 '20 07:05 pkropachev

@celloni This has now been deferred to the second half of 2021.

https://developer.microsoft.com/en-us/office/blogs/deferred-end-of-support-date-for-basic-authentication-in-exchange-online/

avromf avatar May 25 '20 13:05 avromf