soap-client icon indicating copy to clipboard operation
soap-client copied to clipboard

Does not support large SOAP responses

Open rajivraman-MRM opened this issue 1 month ago • 1 comments

Bug Report

Q A
BC Break no
Version 4.5.0

Summary

Cannot consume large SOAP responses.

Current behavior

When consuming a large SOAP response, I get this error:

[FATAL] : Resource limit exceeded: Text node too long, try XML_PARSE_HUGE

How to reproduce

I've generated a SOAP client (using the wizard) against a webservice that generates large responses, mainly due to Base64-encoded images within it. We were able to consume these in an older version of PHPro Soap-Client (1.4.1), but it no longer works in 4.5. (I cannot share the webservice with you.)

Expected behavior

PHPro Soap-Client should be able to consume the XML without error. I was able to trace the error down through php-soap, down through veewee-xml, to this code in VeeWee\Xml\Dom\Document:

    public static function fromXmlString(string $xml, callable ...$configurators): self
    {
        return self::configure(
            loader(xml_string_loader($xml, LIBXML_PARSEHUGE)),
            ...$configurators
        );
    }

By passing in the LIBXML_PARSEHUGE flag, the error no longer occurs. Is there any way to snake this down as a config option?

rajivraman-MRM avatar Dec 12 '25 18:12 rajivraman-MRM

Hello,

Thanks for reporting.

Passing it down as an option isn't really possible at this moment and requires quite some rework: The decoder reads the xml document, but it is for example also possible that any HTTP middleware intermediately parses (part of) the xml.

One thing I could do, is to alwyas enable LIBXML_PARSEHUGE within the SoapEnvelopeReader. I'm just not sure that this is a good idea given that the xmllib limitations are there for reasons:

Security considerations

Enables XXE Vulnerabilities: The most critical impact is the potential to facilitate XML External Entity (XXE) vulnerabilities. When combined with other options like LIBXML_NOENT (which performs entity substitution) or LIBXML_DTDLOAD (which loads DTDs), LIBXML_PARSEHUGE can allow attackers to include external resources. Arbitrary File Read and RCE Potential: Attackers can craft malicious XML payloads to read arbitrary files from the server's local file system (e.g., sensitive configuration files like /etc/passwd) or internal network. In some scenarios, this can be chained with other vulnerabilities (like PHP filter chains) to achieve full Remote Code Execution (RCE). DoS Attacks: While the flag's purpose is to prevent errors with legitimate large files, by allowing huge input, it inherently makes the system more susceptible to DoS attacks if untrusted data is processed without additional security measures.

Current XML parsing Limitations

Maximum Text Node Size: The maximum size for a single text node is limited to 10MB (10,000,000 bytes). Documents containing a single text node larger than this limit will trigger a parse error. Maximum Element Nesting Depth: The default maximum depth for element nesting is 256 levels. Maximum Name Length: The maximum size allowed for an element or attribute name (markup identifier) is 50,000 characters. Maximum Dictionary Size: The parser has a default safety boundary of 100MB for its internal dictionary. Maximum Lookups: The maximum amount of lookahead the parser performs is limited to 10MB.

veewee avatar Dec 15 '25 08:12 veewee

Thanks for sharing your thoughts. What would you suggest we do?:

  • We could fork veewee/xml in order to do this override for our purposes, and then pull our fork into PHPro Soap-Client.
  • This "Http Middleware" you speak of; is there an example of writing a custom one? Could you provide some guidance on how to do that? (And by doing that, we would completely bypass veewee/xml?)

Thank you.

rajivraman-MRM avatar Dec 15 '25 14:12 rajivraman-MRM

  • No.
  • Examples can be found here: https://github.com/php-soap/psr18-transport?tab=readme-ov-file#middleware It is not a mean to bypass anything, rather a way to manipulate requests / responses.

veewee avatar Dec 15 '25 16:12 veewee

I think the easiest way forward, is to make the libxml decoder options configurable on the https://github.com/php-soap/encoding/blob/main/src/EncoderRegistry.php

That way, we can make them configurable from this package's code generation configuration as well. The encoder registry (through the encoder metadata) can be passed to the https://github.com/php-soap/encoding/blob/main/src/Xml/Reader/SoapEnvelopeReader.php

That way, we can pass the options to the encoding component.

If we want to fix it in the HTTP middlewares as well, we can make the loader directly configurable through parameters in e.g. https://github.com/php-soap/psr18-transport/blob/main/src/Xml/XmlMessageManipulator.php

veewee avatar Dec 16 '25 06:12 veewee

@rajivraman-MRM Can you verify if this would work for you?

https://github.com/php-soap/encoding/pull/43

veewee avatar Dec 16 '25 13:12 veewee

@rajivraman-MRM Can you verify if this would work for you?

php-soap/encoding#43

Thank you, that works! Will this be merged down to main soon so we can use it?

rajivraman-MRM avatar Dec 16 '25 15:12 rajivraman-MRM

It is merged ;)

veewee avatar Dec 17 '25 06:12 veewee