AnchorCheck plugin seems not working…well...
Hi,
config : linkchecker 9.3 / python 2.7.8 / Cent OS 6.5
$ cat test.html
<html><head></head>
<body>
<hr>
<a href="#broken">broken link</a> <br>
<a href="#working">working link</a> <br>
<H2 id="working">working…</H2>
</body></html>
I start linkchecker with AnchorCheck enabled :
$ linkchecker -t 1 -v test.html
…
LinkChecker 9.3 Copyright (C) 2000-2014 Bastian Kleineidam
...
Start checking at 2014-10-21 09:44:25-004
URL `file:///home/buildbot/WorkSpace/test.html'
Name `test.html'
Real URL file:///home/buildbot/WorkSpace/test.html
Result Valid
URL `#working'
Name `working link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 5, col 1
Real URL file:///home/buildbot/WorkSpace/test.html
Result Valid
URL `#broken'
Name `broken link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 4, col 1
Real URL file:///home/buildbot/WorkSpace/test.html
Result Valid
That's it. 3 links in 1 URL checked. 0 warnings found. 0 errors found.
Stopped checking at 2014-10-21 09:44:25-004 (0.02 seconds)
with version 8.1 :
$ linkchecker -a test.html
…
LinkChecker 8.1 Copyright (C) 2000-2012 Bastian Kleineidam
...
Start checking at 2014-10-21 15:43:44+002
URL `#broken'
Name `broken link'
Parent URL file:///home/buildbot/WorkSpace/test.html, line 4, col 1
Real URL file:///home/buildbot/WorkSpace/test.html
D/L time 0.000 seconds
Size 184B
Info 2 URLs parsed.
Warning [url-anchor-not-found] Anchor `broken' not found.
Available anchors: `working'.
Result Valid
Statistics:
Robots.txt cache: 0 hits, 0 misses
Content types: 0 image, 3 text, 0 video, 0 audio, 0 application, 0 mail and 0 other.
URL lengths: min=41, max=41, avg=41.
That's it. 3 links checked. 1 warning found. 0 errors found.
Stopped checking at 2014-10-21 15:43:44+002 (0.02 seconds)
I make something wrong ?
regards
jmb
... I get the same. Incorrect anchors are always marked as 'Valid'.
I noticed --anchors was deprecated in favor of plugins. However, even when using a plugin, the anchors aren't checked. I noticed none of the URLs with anchors come through, so the problem seems to be in the core.
Any chance to see this issue fixed anytime soon? Thanks!
Issue #513 might provide some important insight into the problem.
oh, that is a nice one.... here is one of the examples of oddity -- initial run finds the error, but the other ones (another loop with 2 files to go through) -- not. I kept poking around more, even with -t -1 (no threading?) the order of logged debug output is varying... some dict/set/whatever seems to provide things in random order and I guess some decision making is done based on the previously visited urls, thus in some cases some anchored urls do not reach the check (my wild guess).
First run finds, the other one not
(git)hopa:~/proj/bids/bids-specification[bf-links]git
$> for f in /home/yoh/proj/bids/bids-specification/site/01*html; do echo $f; linkchecker $f; done
/home/yoh/proj/bids/bids-specification/site/01-introduction.html
INFO linkcheck.cmdline 2018-10-30 23:31:49,021 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0 Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html
Start checking at 2018-10-30 23:31:49-004
URL `03-modality-agnostic-files.html#YYY'
Name `\n Modality agnostic files\n '
Parent URL file:///home/yoh/proj/bids/bids-specification/site/01-introduction.html, line 268, col 5
Real URL file:///home/yoh/proj/bids/bids-specification/site/03-modality-agnostic-files.html
Check time 0.449 seconds
D/L time 0.000 seconds
Size 24.20KB
Modified 2018-10-31 02:46:44.554920Z
Warning [None] Anchor `YYY' not found. Available anchors:
`__drawer', `__search', `__toc', `changes', `code',
`dataset-description', `dataset_descriptionjson',
`modality-agnostic-files', `nav-1', `nav-1-4',
`participants-file', `readme', `scans-file'.
Result Valid
3 threads active, 0 links queued, 159 links in 162 URLs checked, runtime 1 seconds
Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.
That's it. 186 links in 186 URLs checked. 1 warning found. 0 errors found.
Stopped checking at 2018-10-30 23:31:50-004 (1 seconds)
1 15528 ->1.....................................:Tue 30 Oct 2018 11:31:51 PM EDT:.
(git)hopa:~/proj/bids/bids-specification[bf-links]git
$> for f in /home/yoh/proj/bids/bids-specification/site/0[12]*html; do echo $f; linkchecker $f; done
/home/yoh/proj/bids/bids-specification/site/01-introduction.html
INFO linkcheck.cmdline 2018-10-30 23:32:00,725 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0 Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html
Start checking at 2018-10-30 23:32:00-004
Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.
That's it. 186 links in 186 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2018-10-30 23:32:01-004 (0.97 seconds)
/home/yoh/proj/bids/bids-specification/site/02-common-principles.html
INFO linkcheck.cmdline 2018-10-30 23:32:02,878 MainThread Checking intern URLs only; use --check-extern to check extern URLs.
LinkChecker 9.4.0 Copyright (C) 2000-2014 Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' within this
distribution.
Get the newest version at http://wummel.github.io/linkchecker/
Write comments and bugs to https://github.com/wummel/linkchecker/issues
Support this project at http://wummel.github.io/linkchecker/donations.html
Start checking at 2018-10-30 23:32:02-004
Statistics:
Downloaded: 582.65KB.
Content types: 5 image, 23 text, 0 video, 0 audio, 43 application, 0 mail and 115 other.
URL lengths: min=8, max=130, avg=65.
That's it. 186 links in 186 URLs checked. 0 warnings found. 0 errors found.
Stopped checking at 2018-10-30 23:32:03-004 (0.97 seconds)