xmltwig
xmltwig copied to clipboard
SegFault (stack overflow) deallocating XML element with very large number of children
Twig generates a segfault due to a stack overflow when it deallocates an XML tree that includes an element with a large number of children (>>10000). Here's a script that demonstrates it.
#! /usr/bin/perl
use strict;
use warnings;
use XML::Twig;
sub build_destroy_twig {
my $child_count = shift;
my $twig = XML::Twig->new();
$twig->parse('<root/>');
my $root = $twig->root();
print "Creating twig with $child_count children\n";
while( $child_count-- > 0 ) {
my $element = XML::Twig::Elt->new('child');
$element->paste('last_child', $root);
}
}
my $size = 1000;
while( 1 ) {
build_destroy_twig($size);
$size *= 10;
}
On my OSX machine, it crashes at the 100,000 element mark and has a stack trace about 130,000 calls deep.
I previously tracked the cause down to a recursive destroy algorithm used by Twig, but I can't exactly point to the culprit in Twig 3.42 right now. It had to do with recursively walking (via children and next_sibling) the tree until it found the last sibling of each child element and destroying it and then working backwards.
To work around this issue I added this to my code that I call whenever I am dealing with potentially large XML trees:
sub XML::Twig::Elt::pruneChildren {
my $self = shift;
my $maxChildren = shift;
# If no value was passed in, default to 500 children
unless( defined($maxChildren) ) {
$maxChildren = 500;
}
my @children = $self->children();
foreach my $child ( @children ) {
$child->pruneChildren( $maxChildren );
}
while( @children > $maxChildren ) {
my $childToDestroy = pop @children;
$childToDestroy->cut();
}
}
This is a Perl bug, there is not much I can do about it. It's fixed in perl 5.16 and above.