ero length content "No Content in ..."
From https://code.google.com/archive/p/warrick/issues/29
What steps will reproduce the problem? 1. ./warrick.pl -dr 2013-08-05 -d -a ia -D ../ftp/ http://www.atlantischild.hu/
What is the expected output? What do you see instead?
http://wayback.archive.org/web/20111031230326/http://www.atlantischild.hu/index.php?option=com_content&task=view&id=21&Itemid=9 has non-zero lenght, I get zero lenght files: "index.php?option=com_content&task=view&id=21&Itemid=9"
What version of the product are you using? On what operating system? warrickv2-2-5
Please provide any additional information below.
I've got a non-zero lenght file which has GET parameters in its name but all files containing & (ampersand) in their names are empty.
log says (below) as you see, nothig anfter "?" in "To stats ... Location:"
At Frontier location 79 of 769
My frontier at 79: http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28 My memento to get: |http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28|
targetpath: index.php
appending query string option=com_content&task=blogcategory&id=21&Itemid=28
mcurling: /home/davidprog/dev/design-check/atlantis/warrick//mcurl.pl -D "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o" -dt "Sun, 04 Aug 2013 22:00:00 GMT" -tg "http://web.archive.org/web" -L -o "/home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28" "http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28"
Reading logfile: /home/davidprog/dev/design-check/atlantis/warrick/../ftp//logfile.o
To stats http://atlantischild.hu:80/index.php?option=com_content&task=blogcategory&id=21&Itemid=28 => Location: http://web.archive.org/web/20120903050228/http://www.atlantischild.hu/index.php? => /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 --> stat IA
returning /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 Search HTML resource /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28 for links to other missing resources... No Content in /home/davidprog/dev/design-check/atlantis/warrick/../ftp//index.php?option=com_content&task=blogcategory&id=21&Itemid=28!!
This is caused is a simple escaping bug in mcurl.pl and MementoThread.pm that can be fixed with a patch as follows:
~/t2/warrick2$ diff -u ../../warrick2/mcurl.pl mcurl.pl --- ../../warrick2/mcurl.pl 2014-02-05 16:35:37.362518862 -0800 +++ mcurl.pl 2012-03-27 13:02:41.000000000 -0700 @@ -95,10 +95,7 @@
for (my $i = 0; $i <= $#ARGV; ++$i) # { - if ( ( index($ARGV[$i] , ' ') > -1 ) - or ( index($ARGV[$i] , '?') > -1 ) - or ( index($ARGV[$i] , '*') > -1 ) - ) { + if ( index($ARGV[$i] , ' ') > -1 ){ $ARGV[$i] = '"' .$ARGV[$i] . '"'; } } ~/t2/warrick2$ diff -u ../../warrick2/MementoThread.pm MementoThread.pm --- ../../warrick2/MementoThread.pm 2014-02-05 16:38:19.914518843 -0800 +++ MementoThread.pm 2012-03-27 13:02:42.000000000 -0700 @@ -97,7 +97,7 @@ $acceptDateTimeHeader = " -H "Accept-Datetime: ".$self->{DateTime}." " "; }
my $command = "curl -I $acceptDateTimeHeader "$self->{URI}" "; my $command = "curl -I $acceptDateTimeHeader $self->{URI} "; if($self->{Debug} == 1){ print "DEBUG: " .$command ."\n"; } @@ -351,7 +351,7 @@
} else {
$command = "curl @params $acceptDateTimeHeader "". $self->{TimeGate} ."/" . $self->{URI} . """;
$command = "curl @params $acceptDateTimeHeader ". $self->{TimeGate} ."/" . $self->{URI};
}
@@ -390,7 +390,7 @@
$command = "curl -I -L $acceptDateTimeHeader ". $self->{Info}->{TimeGate} ;
} else {
-
$command = "curl -I -L $acceptDateTimeHeader "". $self->{TimeGate} ."/" . $self->{URI} . """; + $command = "curl -I -L $acceptDateTimeHeader ". $self->{TimeGate} ."/" . $self->{URI};
}
@@ -667,4 +667,4 @@ return $result; }