python-zeep icon indicating copy to clipboard operation
python-zeep copied to clipboard

zeep crashes on invalid xmlChar

Open Beder3004 opened this issue 6 years ago • 5 comments

Hi, I'm using the most recent version of zeep.

I'm using this wsdl: wsdl = 'https://www.marktstammdatenregister.de/MaStRAPI/wsdl/mastr.wsdl'

I'm using a query with an API ID - hence I did not upload the APIKey but i sent it to you via Mail

Background: The platform I'm querying with zeep is poorly programmed. It is using a lot of wide characters and other unspecified special characters. They do not stop people from entering special characters and therefore I get a TransportError.

One special person on this platform entered a Name which gets transported via xml with "TRIWO Aachen, Geb\xc3\xa4ude PY" Apparently the is an invalid character based on the regulations set by zeep. Even here - with spaces it is & # x 1 B ;

Since zeep is already failing here, I do not yet know, how to "delete/replace" the invalid characters. I don't mind them being solely deleted. I don't need an exact representation of the Name

Do you have any idea of how to create a workaround or to fix this "bug"?

Thank you very much in advance Peter

Minimum viable code:

from zeep import Client, Settings import datetime

apiKey='GOTOMarktstammdatenregister.de,registerAccountas"Api"' myMastrNr = 'useyourAccountNumber'

wsdl = 'https://www.marktstammdatenregister.de/MaStRAPI/wsdl/mastr.wsdl' client = Client(wsdl=wsdl) Settings(strict=False)

#6.7 Akteur Anlage = client.bind('Marktstammdatenregister','Anlage')

startwert=18537 #start with 1 to get the datumAb=datetime.datetime(2019,1,31) limit=1 ##### EDIT HERE TO DOWNLOAD MORE DATA to validate. MAX 2000!

for zeile in range (startwert,5000000,limit): Daten=Anlage.GetListeAlleEinheiten(apiKey=apiKey, marktakteurMastrNummer=myMastrNr, startAb=zeile, datumAb=datumAb, limit=limit)

Error Message

` File "D:/priva/Documents/MaStR-master/untitled3.py", line 43, in Daten=Anlage.GetListeAlleEinheiten(apiKey=apiKey, marktakteurMastrNummer=myMastrNr, startAb=zeile, datumAb=datumAb, limit=limit)

File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\proxy.py", line 42, in call self._op_name, args, kwargs)

File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\wsdl\bindings\soap.py", line 132, in send return self.process_reply(client, operation_obj, response)

File "C:\Users\priva\Anaconda3\lib\site-packages\zeep\wsdl\bindings\soap.py", line 176, in process_reply content=response.content)

TransportError: Server returned response (200) with invalid XML: Invalid XML content received (xmlParseCharRef: invalid xmlChar value 27, line 1, column 947). Content: b'<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><GetListeAlleEinheitenResponse xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle/Anlage"><Ergebniscode xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">OkWeitereDatenVorhanden</Ergebniscode><AufrufVeraltet xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">false</AufrufVeraltet><AufrufLebenszeitEnde xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">9999-12-31T23:59:59.9999999</AufrufLebenszeitEnde><AufrufVersion xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle">1</AufrufVersion><Einheiten xmlns="https://www.marktstammdatenregister.de/Services/Public/1_2/Modelle"><EinheitMastrNummer>SEE907251125684</EinheitMastrNummer><Name>TRIWO Aachen, Geb\xc3\xa4ude PY</Name><Einheitart>Stromerzeugungseinheit</Einheitart><Einheittyp>Solareinheit</Einheittyp><Standort>Philipsstra\xc3\x9fe 8 Geb\xc3\xa4ude PY 52068 Aachen</Standort><Bruttoleistung>159.63</Bruttoleistung><EinheitBetriebsstatus>InBetrieb</EinheitBetriebsstatus><Anlagenbetreiber>ABR902118072378</Anlagenbetreiber><EegMastrNummer>EEG905940913796</EegMastrNummer></Einheiten></GetListeAlleEinheitenResponse></s:Body></s:Envelope>''`

Beder3004 avatar Apr 17 '19 06:04 Beder3004

I ran into this problem too. I modified the function parse_xml in the file <python install>\Python37\Lib\site-packages\zeep\loader.py to include a simple: content = content.replace(b'&#x1B;', b'') this will replace the invalid character with an empty string.

This will presumably break again if you update zeep, there might be a better place to do this.

ethansutcliffe avatar May 13 '19 07:05 ethansutcliffe

Thank you very much. Your explanation and workaround helped me a lot. :-)

Beder3004 avatar May 16 '19 04:05 Beder3004

Hi, same problem and same workaround here! thanks !

tiromance avatar Nov 19 '19 15:11 tiromance

This is a common problem with APIs that don't validate data. There should be a better way to 'correct' XML before the parser. Is there some way of doing this in zeep without hacking core files?

siliconalchemy avatar Nov 20 '22 14:11 siliconalchemy

One way to do this is to override the transport methods that actually bring the data in, before it is parsed, eg:

from zeep.transports import Transport

class patchTransport(Transport):
            def _load_remote_data(self, url):
                self.logger.debug("Loading remote data from: %s", url)
                response = self.session.get(url, timeout=self.load_timeout)
                response.raise_for_status()

                # Fix invalid XML Characters
                response._content = response.content.replace(b"&#x1A;", b"")
                return response.content

            def post(self, address, message, headers):
                if self.logger.isEnabledFor(logging.DEBUG):
                    log_message = message
                    if isinstance(log_message, bytes):
                        log_message = log_message.decode("utf-8")
                    self.logger.debug("HTTP Post to %s:\n%s", address, log_message)

                response = self.session.post(
                    address, data=message, headers=headers, timeout=self.operation_timeout
                )

                # Fix invalid XML Characters
                response._content = response.content.replace(b"&#x1A;", b"")

                if self.logger.isEnabledFor(logging.DEBUG):
                    media_type = get_media_type(
                        response.headers.get("Content-Type", "text/xml")
                    )

                    if media_type == "multipart/related":
                        log_message = response.content
                    else:
                        log_message = response.content
                        if isinstance(log_message, bytes):
                            log_message = log_message.decode(response.encoding or "utf-8")

                    self.logger.debug(
                        "HTTP Response from %s (status: %d):\n%s",
                        address,
                        response.status_code,
                        log_message,
                    )

                return response

transport = patchTransport()
self.client = Client(self.wsdl, transport=transport)

Not very pretty, but it at least doesn't involve hacking the core zeep code. Ideally this would be a separate hook in plugins.

siliconalchemy avatar Nov 20 '22 17:11 siliconalchemy