mojo icon indicating copy to clipboard operation
mojo copied to clipboard

Endless loop in Mojo::DOM caused by root going out of scope

Open jjatria opened this issue 3 years ago • 6 comments

  • Mojolicious version: 9.22
  • Perl version: 5.34.0
  • Operating system: Ubuntu 18.04.6 LTS

Steps to reproduce the behavior

The following works as expected:

use Mojo::DOM;
my $dom = Mojo::DOM->new('<div><a></a><b></b></div>')->at('div')
    ->children('a')->map( sub { say $_->tag } );
# OUTPUT:
# a

However, when storing the DOM in an intermediate variable, the code instead goes into an endless loop when doing a string comparison against an undefined value:

use Mojo::DOM;
my $dom = Mojo::DOM->new('<div><a></a><b></b></div>')->at('div');
$dom->children('a')->map( sub { say $_->tag } );
# OUTPUT:
# Use of uninitialized value in string ne at .../Mojo/DOM58/CSS.pm line 259.
# ... endless loop ...

Note that I have only seen this affect calls to ->children with a selector. If no selector is given, then this error is not triggered:

use Mojo::DOM;
my $dom = Mojo::DOM->new('<div><a></a><b></b></div>')->at('div');
$dom->children->map( sub { say $_->tag } );
# OUTPUT:
# a
# b

Possible diagnosis

The warning points to the _root sub in Mojo::DOM::CSS

sub _root {
  my $tree = shift;
  $tree = $tree->[3] while $tree->[0] ne 'root'; # <-- BOOM
  return $tree;
}

The ne the warning complains about is the one trying to find the root of the tree, which has gone out of scope. This suggested the following workaround, which does allow the code to behave as expected:

use Mojo::DOM;
my $dom = Mojo::DOM->new('<div><a></a><b></b></div>');
my $root = $dom->root; # Keep it in scope, must be above next line
$dom = $dom->at('div');
$dom->children('a')->map( sub { say $_->tag } );
# OUTPUT:
# a

Expected behavior

I expected that storing a reference to the DOM after calling ->at would give me an instance that I could use in the same way as the full DOM, except at a different point in the tree.

Actual behavior

Some calls on the intermediate object triggered an endless loop.

jjatria avatar Mar 11 '22 09:03 jjatria

Came here to +1 this. I factored out "get an element I'm interested into" into a separate function, which returned a subset of the DOM, and calling ->following on the resulting object has either unexpectedly produced nothing or triggered an endless loop spewing out zillions of warning about undefined variables.

skington avatar Aug 15 '24 19:08 skington

Bitten by the same bug today. Mojolicious 9.39 on Perl 5.40.0, Linux 5.12.15.

mbethke avatar Dec 17 '24 11:12 mbethke

Not really fixable without creating memory leaks.

kraih avatar Dec 17 '24 12:12 kraih

I don't understand the problem well enough to suggest where best to document it but I think there should be a hint somewhere that you need to hold on to a reference to the root of what was originally parsed or weird things will happen. It's probably not rare that you have a large-ish XML document but you're only interested in a snippet that you can locate with at()—what would be the best way to safely clone that subtree and free the original object? Stringify and re-parse?

mbethke avatar Dec 18 '24 08:12 mbethke

Even if implementing my expected behaviour is undesirable (because of the memory leak issue), I think the code could react better than going into an endless loop. Even something like

sub _root {
  my $tree = shift;
  while ( $tree->[0] ne 'root' ) {
    $tree = $tree->[3];
    croak 'You are attempting to do something we cannot support' unless defined $tree->[0];
  }
  return $tree;
}

would, I think, be preferable (perhaps with a better error message).

jjatria avatar Dec 18 '24 10:12 jjatria

Raising an exception with a good message would probably be reasonable. The problem is that the link between children and parents is a weak ref, otherwise we would have a circular ref, because parents also hold refs to children.

Edit: I wonder if we don't do it already because it would require code changes in too many places and affect performance negatively. 🤔

kraih avatar Dec 18 '24 10:12 kraih