linkchecker icon indicating copy to clipboard operation
linkchecker copied to clipboard

AnchorCheck plugin seems not working…well...

Open jmbeuken opened this issue 11 years ago • 5 comments

Hi,

config : linkchecker 9.3 / python 2.7.8 / Cent OS 6.5

$ cat test.html
<html><head></head>
<body>
<hr>
<a href="#broken">broken link</a> <br>
<a href="#working">working link</a> <br>
<H2 id="working">working…</H2>
</body></html>

I start linkchecker with AnchorCheck enabled :

$ linkchecker  -t 1 -v test.html
…
LinkChecker 9.3              Copyright (C) 2000-2014 Bastian Kleineidam
...

Start checking at 2014-10-21 09:44:25-004

URL        `file:///home/buildbot/WorkSpace/test.html'
Name       `test.html'
Real URL   file:///home/buildbot/WorkSpace/test.html
Result     Valid

URL        `#working'
Name       `working link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 5, col 1
Real URL   file:///home/buildbot/WorkSpace/test.html
Result     Valid

URL        `#broken'
Name       `broken link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 4, col 1
Real URL   file:///home/buildbot/WorkSpace/test.html
Result     Valid

That's it. 3 links in 1 URL checked. 0 warnings found. 0 errors found.
Stopped checking at 2014-10-21 09:44:25-004 (0.02 seconds)

with version 8.1 :

$ linkchecker -a test.html
…
LinkChecker 8.1              Copyright (C) 2000-2012 Bastian Kleineidam
...
Start checking at 2014-10-21 15:43:44+002

URL        `#broken'
Name       `broken link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 4, col 1
Real URL   file:///home/buildbot/WorkSpace/test.html
D/L time   0.000 seconds
Size       184B
Info       2 URLs parsed.
Warning    [url-anchor-not-found] Anchor `broken' not found.
           Available anchors: `working'.
Result     Valid

Statistics:
Robots.txt cache: 0 hits, 0 misses
Content types: 0 image, 3 text, 0 video, 0 audio, 0 application, 0 mail and 0 other.
URL lengths: min=41, max=41, avg=41.

That's it. 3 links checked. 1 warning found. 0 errors found.
Stopped checking at 2014-10-21 15:43:44+002 (0.02 seconds)

I make something wrong ?

regards

jmb

jmbeuken avatar Oct 21 '14 14:10 jmbeuken

... I get the same. Incorrect anchors are always marked as 'Valid'.

lemzwerg avatar Mar 17 '15 08:03 lemzwerg

I noticed --anchors was deprecated in favor of plugins. However, even when using a plugin, the anchors aren't checked. I noticed none of the URLs with anchors come through, so the problem seems to be in the core.

remko avatar Apr 03 '16 08:04 remko

Any chance to see this issue fixed anytime soon? Thanks!

RainerKlute avatar Jul 17 '17 14:07 RainerKlute

Issue #513 might provide some important insight into the problem.

RainerKlute avatar Jul 17 '17 14:07 RainerKlute

oh, that is a nice one.... here is one of the examples of oddity -- initial run finds the error, but the other ones (another loop with 2 files to go through) -- not. I kept poking around more, even with -t -1 (no threading?) the order of logged debug output is varying... some dict/set/whatever seems to provide things in random order and I guess some decision making is done based on the previously visited urls, thus in some cases some anchored urls do not reach the check (my wild guess).

First run finds, the other one not
(git)hopa:~/proj/bids/bids-specification[bf-links]git
$> for f in /home/yoh/proj/bids/bids-specification/site/01*html; do echo $f; linkchecker $f; done 
/home/yoh/proj/bids/bids-specification/site/01-introduction.html
INFO linkcheck.cmdline 2018-10-30 23:31:49,021 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2018-10-30 23:31:49-004

URL        `03-modality-agnostic-files.html#YYY'
Name       `\n      Modality agnostic files\n    '
Parent URL file:///home/yoh/proj/bids/bids-specification/site/01-introduction.html, line 268, col 5
Real URL   file:///home/yoh/proj/bids/bids-specification/site/03-modality-agnostic-files.html
Check time 0.449 seconds
D/L time   0.000 seconds
Size       24.20KB
Modified   2018-10-31 02:46:44.554920Z
Warning    [None] Anchor `YYY' not found. Available anchors:
           `__drawer', `__search', `__toc', `changes', `code',
           `dataset-description', `dataset_descriptionjson',
           `modality-agnostic-files', `nav-1', `nav-1-4',
           `participants-file', `readme', `scans-file'.
Result     Valid
 3 threads active,     0 links queued,  159 links in 162 URLs checked, runtime 1 seconds

Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.

That's it. 186 links in 186 URLs checked. 1 warning found. 0 errors found.
Stopped checking at 2018-10-30 23:31:50-004 (1 seconds)
1 15528 ->1.....................................:Tue 30 Oct 2018 11:31:51 PM EDT:.
(git)hopa:~/proj/bids/bids-specification[bf-links]git
$> for f in /home/yoh/proj/bids/bids-specification/site/0[12]*html; do echo $f; linkchecker $f; done 
/home/yoh/proj/bids/bids-specification/site/01-introduction.html
INFO linkcheck.cmdline 2018-10-30 23:32:00,725 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2018-10-30 23:32:00-004

Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.

That's it. 186 links in 186 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2018-10-30 23:32:01-004 (0.97 seconds)
/home/yoh/proj/bids/bids-specification/site/02-common-principles.html
INFO linkcheck.cmdline 2018-10-30 23:32:02,878 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0              Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html

Start checking at 2018-10-30 23:32:02-004

Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.

That's it. 186 links in 186 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2018-10-30 23:32:03-004 (0.97 seconds)

yarikoptic avatar Oct 31 '18 03:10 yarikoptic