warrick icon indicating copy to clipboard operation
warrick copied to clipboard

ero length content "No Content in ..."

Open machawk1 opened this issue 8 years ago • 0 comments

From https://code.google.com/archive/p/warrick/issues/29

What steps will reproduce the problem? 1. ./warrick.pl -dr 2013-08-05 -d -a ia -D ../ftp/ http://www.atlantischild.hu/

What is the expected output? What do you see instead?

http://wayback.archive.org/web/20111031230326/http://www.atlantischild.hu/index.php?option=com_content&task=view&id=21&Itemid=9 has non-zero lenght, I get zero lenght files: "index.php?option=com_content&task=view&id=21&Itemid=9"

What version of the product are you using? On what operating system? warrickv2-2-5

Please provide any additional information below.

I've got a non-zero lenght file which has GET parameters in its name but all files containing & (ampersand) in their names are empty.

log says (below) as you see, nothig anfter "?" in "To stats ... Location:"

At Frontier location 79 of 769

My frontier at 79: http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28 My memento to get: |http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28|

targetpath: index.php

appending query string option=com_content&task=blogcategory&id=21&Itemid=28

mcurling: /home/davidprog/dev/design-check/atlantis/warrick//mcurl.pl -D "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o" -dt "Sun, 04 Aug 2013 22:00:00 GMT" -tg "http://web.archive.org/web" -L -o "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28" "http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28"

Reading logfile: /home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o

To stats http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28 => Location: http://web.archive.org/web/20120903050228/http://www.atlantischild.hu/index.php? => /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 --> stat IA

returning /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 Search HTML resource /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 for links to other missing resources... No Content in /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28!!

This is caused is a simple escaping bug in mcurl.pl and MementoThread.pm that can be fixed with a patch as follows:

~/t2/warrick2$ diff -u ../../warrick2/mcurl.pl mcurl.pl --- ../../warrick2/mcurl.pl 2014-02-05 16:35:37.362518862 -0800 +++ mcurl.pl 2012-03-27 13:02:41.000000000 -0700 @@ -95,10 +95,7 @@

for (my $i = 0; $i <= $#ARGV; ++$i) # { - if ( ( index($ARGV[$i] , ' ') > -1 ) - or ( index($ARGV[$i] , '?') > -1 ) - or ( index($ARGV[$i] , '*') > -1 ) - ) { + if ( index($ARGV[$i] , ' ') > -1 ){ $ARGV[$i] = '"' .$ARGV[$i] . '"'; } } ~/t2/warrick2$ diff -u ../../warrick2/MementoThread.pm MementoThread.pm --- ../../warrick2/MementoThread.pm 2014-02-05 16:38:19.914518843 -0800 +++ MementoThread.pm 2012-03-27 13:02:42.000000000 -0700 @@ -97,7 +97,7 @@ $acceptDateTimeHeader = " -H "Accept-Datetime: ".$self->{DateTime}." " "; }

my $command = "curl -I $acceptDateTimeHeader "$self->{URI}" "; my $command = "curl -I $acceptDateTimeHeader $self->{URI} "; if($self->{Debug} == 1){ print "DEBUG: " .$command ."\n"; } @@ -351,7 +351,7 @@

} else {

$command = "curl @params $acceptDateTimeHeader "". $self->{TimeGate} ."/" . $self->{URI} . """;

$command = "curl @params $acceptDateTimeHeader ". $self->{TimeGate} ."/" . $self->{URI};

}

@@ -390,7 +390,7 @@

         $command = "curl -I -L $acceptDateTimeHeader ". $self->{Info}->{TimeGate} ;
     } else {
  • $command = "curl -I -L $acceptDateTimeHeader "". $self->{TimeGate} ."/" . $self->{URI} . """; + $command = "curl -I -L $acceptDateTimeHeader ". $self->{TimeGate} ."/" . $self->{URI};

       }
    

@@ -667,4 +667,4 @@ return $result; }

machawk1 avatar Jun 19 '17 15:06 machawk1