python-zeep
python-zeep copied to clipboard
zeep crashes on invalid xmlChar
Hi, I'm using the most recent version of zeep.
I'm using this wsdl: wsdl = 'https://www.marktstammdatenregister.de/MaStRAPI/wsdl/mastr.wsdl'
I'm using a query with an API ID - hence I did not upload the APIKey but i sent it to you via Mail
Background: The platform I'm querying with zeep is poorly programmed. It is using a lot of wide characters and other unspecified special characters. They do not stop people from entering special characters and therefore I get a TransportError.
One special person on this platform entered a Name which gets transported via xml with "TRIWO Aachen, Geb\xc3\xa4ude PY" Apparently the is an invalid character based on the regulations set by zeep. Even here - with spaces it is & # x 1 B ;
Since zeep is already failing here, I do not yet know, how to "delete/replace" the invalid characters. I don't mind them being solely deleted. I don't need an exact representation of the Name
Do you have any idea of how to create a workaround or to fix this "bug"?
Thank you very much in advance Peter
Minimum viable code:
from zeep import Client, Settings import datetime
apiKey='GOTOMarktstammdatenregister.de,registerAccountas"Api"' myMastrNr = 'useyourAccountNumber'
wsdl = 'https://www.marktstammdatenregister.de/MaStRAPI/wsdl/mastr.wsdl' client = Client(wsdl=wsdl) Settings(strict=False)
#6.7 Akteur Anlage = client.bind('Marktstammdatenregister','Anlage')
startwert=18537 #start with 1 to get the datumAb=datetime.datetime(2019,1,31) limit=1 ##### EDIT HERE TO DOWNLOAD MORE DATA to validate. MAX 2000!
for zeile in range (startwert,5000000,limit): Daten=Anlage.GetListeAlleEinheiten(apiKey=apiKey, marktakteurMastrNummer=myMastrNr, startAb=zeile, datumAb=datumAb, limit=limit)
Error Message
` File "D:/priva/Documents/MaStR-master/untitled3.py", line 43, in
File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\proxy.py", line 42, in call self._op_name, args, kwargs)
File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\wsdl\bindings\soap.py", line 132, in send return self.process_reply(client, operation_obj, response)
File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\wsdl\bindings\soap.py", line 176, in process_reply content=response.content)
TransportError: Server returned response (200) with invalid XML: Invalid XML content received (xmlParseCharRef: invalid xmlChar value 27, line 1, column 947). Content: b'<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><GetListeAlleEinheitenResponse xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle/Anlage"><Ergebniscode xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">OkWeitereDatenVorhanden</Ergebniscode><AufrufVeraltet xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">false</AufrufVeraltet><AufrufLebenszeitEnde xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">9999-12-31T23:59:59.9999999</AufrufLebenszeitEnde><AufrufVersion xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">1</AufrufVersion><Einheiten xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle"><EinheitMastrNummer>SEE907251125684</EinheitMastrNummer><Name>TRIWO Aachen, Geb\xc3\xa4ude PY</Name><Einheitart>Stromerzeugungseinheit</Einheitart><Einheittyp>Solareinheit</Einheittyp><Standort>Philipsstra\xc3\x9fe 8 Geb\xc3\xa4ude PY 52068 Aachen</Standort><Bruttoleistung>159.63</Bruttoleistung><EinheitBetriebsstatus>InBetrieb</EinheitBetriebsstatus><Anlagenbetreiber>ABR902118072378</Anlagenbetreiber><EegMastrNummer>EEG905940913796</EegMastrNummer></Einheiten></GetListeAlleEinheitenResponse></s:Body></s:Envelope>''`
I ran into this problem too.
I modified the function parse_xml in the file <python install>\Python37\Lib\site-packages\zeep\loader.py
to include a simple:
content = content.replace(b'', b'')
this will replace the invalid character with an empty string.
This will presumably break again if you update zeep, there might be a better place to do this.
Thank you very much. Your explanation and workaround helped me a lot. :-)
Hi, same problem and same workaround here! thanks !
This is a common problem with APIs that don't validate data. There should be a better way to 'correct' XML before the parser. Is there some way of doing this in zeep without hacking core files?
One way to do this is to override the transport methods that actually bring the data in, before it is parsed, eg:
from zeep.transports import Transport
class patchTransport(Transport):
def _load_remote_data(self, url):
self.logger.debug("Loading remote data from: %s", url)
response = self.session.get(url, timeout=self.load_timeout)
response.raise_for_status()
# Fix invalid XML Characters
response._content = response.content.replace(b"", b"")
return response.content
def post(self, address, message, headers):
if self.logger.isEnabledFor(logging.DEBUG):
log_message = message
if isinstance(log_message, bytes):
log_message = log_message.decode("utf-8")
self.logger.debug("HTTP Post to %s:\n%s", address, log_message)
response = self.session.post(
address, data=message, headers=headers, timeout=self.operation_timeout
)
# Fix invalid XML Characters
response._content = response.content.replace(b"", b"")
if self.logger.isEnabledFor(logging.DEBUG):
media_type = get_media_type(
response.headers.get("Content-Type", "text/xml")
)
if media_type == "multipart/related":
log_message = response.content
else:
log_message = response.content
if isinstance(log_message, bytes):
log_message = log_message.decode(response.encoding or "utf-8")
self.logger.debug(
"HTTP Response from %s (status: %d):\n%s",
address,
response.status_code,
log_message,
)
return response
transport = patchTransport()
self.client = Client(self.wsdl, transport=transport)
Not very pretty, but it at least doesn't involve hacking the core zeep code. Ideally this would be a separate hook in plugins.