Does not support large SOAP responses
Bug Report
| Q | A |
|---|---|
| BC Break | no |
| Version | 4.5.0 |
Summary
Cannot consume large SOAP responses.
Current behavior
When consuming a large SOAP response, I get this error:
[FATAL] : Resource limit exceeded: Text node too long, try XML_PARSE_HUGE
How to reproduce
I've generated a SOAP client (using the wizard) against a webservice that generates large responses, mainly due to Base64-encoded images within it. We were able to consume these in an older version of PHPro Soap-Client (1.4.1), but it no longer works in 4.5. (I cannot share the webservice with you.)
Expected behavior
PHPro Soap-Client should be able to consume the XML without error. I was able to trace the error down through php-soap, down through veewee-xml, to this code in VeeWee\Xml\Dom\Document:
public static function fromXmlString(string $xml, callable ...$configurators): self
{
return self::configure(
loader(xml_string_loader($xml, LIBXML_PARSEHUGE)),
...$configurators
);
}
By passing in the LIBXML_PARSEHUGE flag, the error no longer occurs. Is there any way to snake this down as a config option?
Hello,
Thanks for reporting.
Passing it down as an option isn't really possible at this moment and requires quite some rework: The decoder reads the xml document, but it is for example also possible that any HTTP middleware intermediately parses (part of) the xml.
One thing I could do, is to alwyas enable LIBXML_PARSEHUGE within the SoapEnvelopeReader.
I'm just not sure that this is a good idea given that the xmllib limitations are there for reasons:
Security considerations
Enables XXE Vulnerabilities: The most critical impact is the potential to facilitate XML External Entity (XXE) vulnerabilities. When combined with other options like LIBXML_NOENT (which performs entity substitution) or LIBXML_DTDLOAD (which loads DTDs), LIBXML_PARSEHUGE can allow attackers to include external resources. Arbitrary File Read and RCE Potential: Attackers can craft malicious XML payloads to read arbitrary files from the server's local file system (e.g., sensitive configuration files like /etc/passwd) or internal network. In some scenarios, this can be chained with other vulnerabilities (like PHP filter chains) to achieve full Remote Code Execution (RCE). DoS Attacks: While the flag's purpose is to prevent errors with legitimate large files, by allowing huge input, it inherently makes the system more susceptible to DoS attacks if untrusted data is processed without additional security measures.
Current XML parsing Limitations
Maximum Text Node Size: The maximum size for a single text node is limited to 10MB (10,000,000 bytes). Documents containing a single text node larger than this limit will trigger a parse error. Maximum Element Nesting Depth: The default maximum depth for element nesting is 256 levels. Maximum Name Length: The maximum size allowed for an element or attribute name (markup identifier) is 50,000 characters. Maximum Dictionary Size: The parser has a default safety boundary of 100MB for its internal dictionary. Maximum Lookups: The maximum amount of lookahead the parser performs is limited to 10MB.
Thanks for sharing your thoughts. What would you suggest we do?:
- We could fork veewee/xml in order to do this override for our purposes, and then pull our fork into PHPro Soap-Client.
- This "Http Middleware" you speak of; is there an example of writing a custom one? Could you provide some guidance on how to do that? (And by doing that, we would completely bypass veewee/xml?)
Thank you.
- No.
- Examples can be found here: https://github.com/php-soap/psr18-transport?tab=readme-ov-file#middleware It is not a mean to bypass anything, rather a way to manipulate requests / responses.
I think the easiest way forward, is to make the libxml decoder options configurable on the https://github.com/php-soap/encoding/blob/main/src/EncoderRegistry.php
That way, we can make them configurable from this package's code generation configuration as well. The encoder registry (through the encoder metadata) can be passed to the https://github.com/php-soap/encoding/blob/main/src/Xml/Reader/SoapEnvelopeReader.php
That way, we can pass the options to the encoding component.
If we want to fix it in the HTTP middlewares as well, we can make the loader directly configurable through parameters in e.g. https://github.com/php-soap/psr18-transport/blob/main/src/Xml/XmlMessageManipulator.php
@rajivraman-MRM Can you verify if this would work for you?
https://github.com/php-soap/encoding/pull/43
@rajivraman-MRM Can you verify if this would work for you?
Thank you, that works! Will this be merged down to main soon so we can use it?
It is merged ;)