linkchecker icon indicating copy to clipboard operation
linkchecker copied to clipboard

Skips links marked as known broken

Open nickpiggott opened this issue 6 years ago • 7 comments

Skips anchor links that have a class value of "broken_link".

Should add this as a config option to allow any class name to be excluded from checking.

nickpiggott avatar Apr 21 '20 12:04 nickpiggott

Hi

I'm afraid I'm definitely not a Python pro, so thanks for the guidance.

  • Yes, it should be configurable on the command line / config file. I'll add an option
  • Yes, it should split(). That was a crude hammer to see if it was working as expected and broadly the right approach

I'll make some changes, add the config option, and document. I'll need to get some help on designing the unit test.

nickpiggott avatar Apr 21 '20 20:04 nickpiggott

I've made some changes, including adding switches / config options for a list of classes to ignore.

I'm not sure the best way to pull that definition of class names into linkparse.py, so that's still statically coded as a variable definition in linkparse.py. Suggestions welcome on how best to pull config["ignoreclass"] in to linkparse.py.

nickpiggott avatar Apr 22 '20 14:04 nickpiggott

I think this latest set of changes fixes up everything. the --ignore-classes switch takes a comma separated list of classes, which are excluded if any of them are found on the anchor link.

I'll work out how to create and run some tests. I'm using this on a production box each night, so can keep an eye on it too.

nickpiggott avatar Apr 24 '20 14:04 nickpiggott

I'll work out how to create and run some tests. I'm using this on a production box each night, so can keep an eye on it too.

It would be nice to have one integration test. AFAIU these are mostly pairs of files in tests/checker/data/: an HTML file and a .result file. I would suggest copying an existing test and modifying the HTML to add a class attribute and modifying the .results file to exclude the link that should be omitted. Next, the test itself that uses the files -- check out tests/checker/test_http.py. There's a test class, and a test method, and it calls self.file_test() and passes the filename. So if you e.g. copy http.html to http_ignoreclass.html, you can then add an invocation to self.file_test, and then I think you can pass the ignoreclasses=... config option in the confargs.

I'm not sure what stage of config parsing this exists in, and whether you should store a list or a comma-separated string in confargs. I would guess a list.

Running the tests should be simple: tox (if you've got it installed) should take care of installing all the dependencies in its local virtualenvs etc. For rapidly iterating on one particular test I like tox -e py37 -- tests/checker/test_http.py::TestClass::test_name.

(I hope I'm not misleading you with thes suggestions: I'm not one of the original developers of linkchecker and my familiarity with its internal workings and the test suite extends only as much as I've needed to figure out a couple of bug fixes in the past. I'm only doing code reviews because somebody has to, and the original devs are MIA.)

mgedmin avatar Apr 24 '20 17:04 mgedmin

It would be nice to have one integration test.

I said one, because zero is bad (maybe all units work correctly but aren't hooked up together right), and more than one is also bad (integration tests are slow, and one bug tends to break many integration tests).

I haven't said how many unit tests I want. The right number is somewhere between zero and however much is needed to achieve 100% coverage for all the changed lines of code. (https://pypi.org/project/diff-cover/ is a nice tool for this, we should maybe hook it up.) It depends on how much time you have and how much you enjoy writing unit tests ;)

mgedmin avatar Apr 24 '20 17:04 mgedmin

the CI build fails here now and conflicts need to be resolved, sorry... also pay attention to flake8 warnings in the CI logs: they are not enforced yet, but they are worth fixing.

anarcat avatar May 22 '20 13:05 anarcat

Update: I'm working on some tests now, but is turning out to be more work than I anticipated, and needs more code refactoring.

nickpiggott avatar May 22 '20 13:05 nickpiggott