ews-java-api icon indicating copy to clipboard operation
ews-java-api copied to clipboard

Invalid XML Characters are not removed?

Open codecrafting-io opened this issue 9 years ago • 11 comments

Guys I noticed that invalid XML characters are not removed and this causes problems when you try to read emails with special characters that can not be in XML The Exchange does not strip them so the client must do it

When I try to read some emails I get the following problem:

The request failed. ParseError at [row, col]: [1.1475] Message: Character Referece "&# at microsoft.exchange.webservices.data.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:71)

I enabled the trace and saw the xml response that causes the problem, as stretch below:

<t:Subject>Pendencia(s) COVSB600 - PV 1582 - CONVENIO 975300 - CASEBR&#x1A;S CAIXA ASSIS</t:Subject>

Within the subject it has a special character &#x1A which is invalid in XML. It seems that requires some modifications in EwsXmlReader to deal with this problem.

Regards, Lucas

codecrafting-io avatar Mar 02 '15 19:03 codecrafting-io

The issue is really with Exchange server side that if your mail contains invalid XML characters EWS will just send it back. And the exception is raised from JDK's XMLStreamReaderImpl.

wilsonw avatar Mar 10 '15 10:03 wilsonw

I know that is a issue with Exchange, but i don't think that Microsoft Exchange team will change this. To me the api need to enhance about that. I sort of solved the problem (not in the way I would like). Before creating implement factory (EwsXmlReader:initializeXmlReader) I removed these characters from the stream and create a new stream cleaned.

codecrafting-io avatar Mar 16 '15 22:03 codecrafting-io

Yeah if MS is not able to fix then we need to have a clean InputStream.

That's actually handled in their .Net managed API. And again EWS is not accepting request contains invalid XML characters.

wilsonw avatar Mar 17 '15 13:03 wilsonw

maybe anyone of you guys can provide a PR?

serious6 avatar Mar 18 '15 19:03 serious6

While intercepting InputStream is a way that could be quite expensive when dealing with large XML response to filter out things like &#1A;

Interestingly I noticed most these characters are actually Valid in XML1.1. But just that EWS returning with it's not using XML11 scanner.

I'm exploring if we can find a way to force EwsXmlReader to use XML11 instead.

wilsonw avatar Mar 19 '15 01:03 wilsonw

I tried to change the xml response to 1.1 version It ignorantly changing the 1.0 part to 1.1, but did not work (of course what I thinking?). Yeah I also think that large xml files could have performance impact by "opening the stream" and remove those characteres and creating a cleaned one. Why with the managed api this seems to be no problem? It's beacuse of native xml reader or they made some sort cleaning?

codecrafting-io avatar Mar 20 '15 04:03 codecrafting-io

So, I got this resolved in a pretty much hacking way. I'm not so happy about this as basically you need to copy quite a few OpenJDK JAXP code inside so I didn't create any patch here.

Firstly, create a new Scanner by extending XMLNSDocumentScannerImpl and override isInvalid(...), I use XML11 validation here.

public class XMLNSDocumentScannerXML11CharValidateImpl extends XMLNSDocumentScannerImpl {
    @Override
    protected boolean isInvalid(int value) {
        return (XML11Char.isXML11Invalid(value));
    }
}

And in order to use this new class, you need to basically create a new XMLStreamReaderImpl to use the new scanner, and create a new XMLInputFactoryImpl to create StreamReader instance.

wilsonw avatar Mar 23 '15 17:03 wilsonw

Probably covered by #353 but here is another stacktrace:

microsoft.exchange.webservices.data.core.exception.service.remote.ServiceRequestException: The request failed. ParseError at [row,col]:[4,760]
Message: Character reference "&#
    at microsoft.exchange.webservices.data.core.request.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:74)
    at microsoft.exchange.webservices.data.core.request.MultiResponseServiceRequest.execute(MultiResponseServiceRequest.java:158)
    at microsoft.exchange.webservices.data.core.ExchangeService.internalBindToItems(ExchangeService.java:1343)
    at microsoft.exchange.webservices.data.core.ExchangeService.bindToItems(ExchangeService.java:1360)

pathikrit avatar Jan 16 '16 00:01 pathikrit

I came across this bug the other day and my solution was to lower the emails I was processing each time. I was loading 50 per run and this seemed to be more like remanish data on buffers problem than an error on the xml so I started doing 10 emails per run and repeat 5 times and this error stopped.

udobranco avatar Jun 19 '17 12:06 udobranco

@codecrafting-io @serious6 Can anyone know how to generate this type of email? I mean with special characters.

girishghoda57 avatar May 11 '20 11:05 girishghoda57

i managed to create a postman request which creates an email with the character in the subject. If someone stumbles on this the url and authentification has to be set manually. NTLM Authentication with correct credentials has to be used. Others may work but i didn't try and i don't know.
EWS-Share.postman_collection.txt You may have to change the file type ending with json to import it into your postman

friedran-levi avatar Aug 12 '21 14:08 friedran-levi