ews-java-api
ews-java-api copied to clipboard
Invalid XML Characters are not removed?
Guys I noticed that invalid XML characters are not removed and this causes problems when you try to read emails with special characters that can not be in XML The Exchange does not strip them so the client must do it
When I try to read some emails I get the following problem:
The request failed. ParseError at [row, col]: [1.1475] Message: Character Referece "&# at microsoft.exchange.webservices.data.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:71)
I enabled the trace and saw the xml response that causes the problem, as stretch below:
<t:Subject>Pendencia(s) COVSB600 - PV 1582 - CONVENIO 975300 - CASEBRS CAIXA ASSIS</t:Subject>
Within the subject it has a special character  which is invalid in XML. It seems that requires some modifications in EwsXmlReader to deal with this problem.
Regards, Lucas
The issue is really with Exchange server side that if your mail contains invalid XML characters EWS will just send it back. And the exception is raised from JDK's XMLStreamReaderImpl.
I know that is a issue with Exchange, but i don't think that Microsoft Exchange team will change this. To me the api need to enhance about that. I sort of solved the problem (not in the way I would like). Before creating implement factory (EwsXmlReader:initializeXmlReader) I removed these characters from the stream and create a new stream cleaned.
Yeah if MS is not able to fix then we need to have a clean InputStream.
That's actually handled in their .Net managed API. And again EWS is not accepting request contains invalid XML characters.
maybe anyone of you guys can provide a PR?
While intercepting InputStream is a way that could be quite expensive when dealing with large XML response to filter out things like A;
Interestingly I noticed most these characters are actually Valid in XML1.1. But just that EWS returning with it's not using XML11 scanner.
I'm exploring if we can find a way to force EwsXmlReader to use XML11 instead.
I tried to change the xml response to 1.1 version It ignorantly changing the 1.0 part to 1.1, but did not work (of course what I thinking?). Yeah I also think that large xml files could have performance impact by "opening the stream" and remove those characteres and creating a cleaned one. Why with the managed api this seems to be no problem? It's beacuse of native xml reader or they made some sort cleaning?
So, I got this resolved in a pretty much hacking way. I'm not so happy about this as basically you need to copy quite a few OpenJDK JAXP code inside so I didn't create any patch here.
Firstly, create a new Scanner by extending XMLNSDocumentScannerImpl and override isInvalid(...), I use XML11 validation here.
public class XMLNSDocumentScannerXML11CharValidateImpl extends XMLNSDocumentScannerImpl {
@Override
protected boolean isInvalid(int value) {
return (XML11Char.isXML11Invalid(value));
}
}
And in order to use this new class, you need to basically create a new XMLStreamReaderImpl to use the new scanner, and create a new XMLInputFactoryImpl to create StreamReader instance.
Probably covered by #353 but here is another stacktrace:
microsoft.exchange.webservices.data.core.exception.service.remote.ServiceRequestException: The request failed. ParseError at [row,col]:[4,760]
Message: Character reference "&#
at microsoft.exchange.webservices.data.core.request.SimpleServiceRequestBase.internalExecute(SimpleServiceRequestBase.java:74)
at microsoft.exchange.webservices.data.core.request.MultiResponseServiceRequest.execute(MultiResponseServiceRequest.java:158)
at microsoft.exchange.webservices.data.core.ExchangeService.internalBindToItems(ExchangeService.java:1343)
at microsoft.exchange.webservices.data.core.ExchangeService.bindToItems(ExchangeService.java:1360)
I came across this bug the other day and my solution was to lower the emails I was processing each time. I was loading 50 per run and this seemed to be more like remanish data on buffers problem than an error on the xml so I started doing 10 emails per run and repeat 5 times and this error stopped.
@codecrafting-io @serious6 Can anyone know how to generate this type of email? I mean with special characters.
i managed to create a postman request which creates an email with the character in the subject.
If someone stumbles on this the url and authentification has to be set manually. NTLM Authentication with correct credentials has to be used. Others may work but i didn't try and i don't know.
EWS-Share.postman_collection.txt
You may have to change the file type ending with json to import it into your postman