biopython icon indicating copy to clipboard operation
biopython copied to clipboard

Improve test coverage by moving self-tests

Open peterjc opened this issue 9 years ago • 65 comments

Many of the Python files in Biopython still have self-tests at the end which are executed if the file is run directly via the idiom:

if __name__ == "__main__":
    print("Running a quick self-test")
    ...

This is bad for several reasons - these tests are not being run routinely via our buildbot [Update: We stopped using buildbot, but now have AppVeyor instead] or TravisCI (and therefore if the test breaks, we don't know about it), they are not counted on the coverage report https://codecov.io/github/biopython/biopython/ (and in fact actively reduce the coverage score [Update: We configured codecov to ignore the self tests]), and moreover with Python 3 making subtle changes to imports, some of these self tests have accidentally become Python 2 only.

This is a tracking issue to encourage more commits like these which move these self-tests under Tests/test_XXX.py using the unittest framework instead:

  • eb0c7ab903882587f3ece06e51e15c6e68855cc9
  • da5a414819fdaef1fc9eb3a713fb00f92030889e

Or, in some cases using a doctest instead makes sense:

  • 9242f12a65fb1d6bdc82f8201531662fb63a101b

In those cases, we're left with a stub like this which is useful when working directly on a file's doctests:

if __name__ == "__main__":
    from Bio._utils import run_doctest
    run_doctest(verbose=0)

peterjc avatar May 11 '16 14:05 peterjc

Here's an indication of some obvious targets to tackle next:

$ python count_main.py Bio BioSQL
Walking Bio
26  726 Bio/bgzf.py
48  258 Bio/MaxEntropy.py
32  194 Bio/NaiveBayes.py
44  622 Bio/Align/AlignInfo.py
246 574 Bio/AlignIO/FastaIO.py
65  179 Bio/AlignIO/NexusIO.py
25  535 Bio/Blast/NCBIXML.py
358 1730    Bio/GenBank/Scanner.py
11  180 Bio/Graphics/GenomeDiagram/_Colors.py
16  184 Bio/Graphics/GenomeDiagram/_FeatureSet.py
8   172 Bio/Graphics/GenomeDiagram/_GraphSet.py
26  302 Bio/Graphics/GenomeDiagram/_Track.py
29  217 Bio/KDTree/KDTree.py
29  161 Bio/KEGG/KGML/KGML_parser.py
57  1760    Bio/Nexus/Nexus.py
6   61  Bio/PDB/Dice.py
15  344 Bio/PDB/DSSP.py
10  276 Bio/PDB/FragmentMapper.py
26  301 Bio/PDB/HSExposure.py
25  91  Bio/PDB/MMCIF2Dict.py
13  196 Bio/PDB/MMCIFParser.py
9   163 Bio/PDB/NACCESS.py
12  118 Bio/PDB/NeighborSearch.py
12  245 Bio/PDB/parse_pdb_header.py
16  202 Bio/PDB/PDBIO.py
44  303 Bio/PDB/PDBList.py
19  280 Bio/PDB/PDBParser.py
24  408 Bio/PDB/Polypeptide.py
8   87  Bio/PDB/PSEA.py
9   138 Bio/PDB/ResidueDepth.py
27  117 Bio/PDB/StructureAlignment.py
17  66  Bio/PDB/Superimposer.py
55  319 Bio/PDB/Vector.py
42  160 Bio/PDB/QCPSuperimposer/__init__.py
15  329 Bio/SeqIO/SeqXmlIO.py
186 1366    Bio/SeqIO/SffIO.py
18  146 Bio/SeqIO/SwissIO.py
13  113 Bio/SeqUtils/CheckSum.py
30  136 Bio/SVDSuperimposer/__init__.py
16  536 Bio/SwissProt/__init__.py
11  351 Bio/UniProt/GOA.py
Walking BioSQL
Done

This was a quick script to count the number of (non-blank) lines after the __main__ trick (and the total number of non-blank lines) per file, reporting any over 5:

import sys
import os

def count(filename):
    total = 0
    in_main = False
    main = 0
    with open(filename) as in_handle:
        for line in in_handle:
            if not line.strip():
                continue
            total += 1
            if line.startswith("if ") and "__main__" in line:
                in_main = True
            if in_main:
                main += 1
    if main > 5:
        print("%i\t%i\t%s" % (main, total, filename))
    return main, total

for path in sys.argv[1:]:
    print("Walking %s" % path)
    for (dirpath, dirnames, filenames) in os.walk(path):
        for f in filenames:
            if f.endswith(".py"):
                count(os.path.join(dirpath, f))
print("Done")

peterjc avatar May 11 '16 14:05 peterjc

@peterjc you might want to label this easy.

vincentdavis avatar Sep 28 '16 02:09 vincentdavis

Good point Vincent - it will vary case by case, but a lot of these should be fairly easy jobs - and the task is well defined.

peterjc avatar Sep 28 '16 08:09 peterjc

For review: Considering coverage of SffIO docstests (56%) VS test_SffIO (57%) VS selftest(73%). Attached are coverage reports for each. Coverage_SffIO_docs.pdf Coverage_SffIO___main__.pdf Coverage_SffIO nostest.pdf

vincentdavis avatar Sep 29 '16 16:09 vincentdavis

Those figures are certainly strong motivation to move the SFF self-tests into our unit-test framework. Looking back it was a mistake to write the tests that way in the first place, but I've grown to appreciated automated testing more and more over the years.

peterjc avatar Sep 29 '16 20:09 peterjc

@peterjc I would like to work on the issue

souravsingh avatar Oct 05 '16 17:10 souravsingh

Hi @souravsingh - how about working on Bio/AlignIO/NexusIO.py moving the self-test into Tests/test_Nexus.py? Are you clear about the task, and how to submit a pull request? Thanks!

peterjc avatar Oct 05 '16 19:10 peterjc

#948 from @vincentdavis did this for Bio/SeqIO/SffIO.py

peterjc avatar Oct 06 '16 16:10 peterjc

Updated list. (excludes not yet KGML_parser, Maxentropy, SVDSuperimposer/init.py fix)

__main__    Total   File
26          732     Bio/bgzf.py
246         574     Bio/AlignIO/FastaIO.py
65          179     Bio/AlignIO/NexusIO.py
25          535     Bio/Blast/NCBIXML.py
358         1746    Bio/GenBank/Scanner.py
11          180     Bio/Graphics/GenomeDiagram/_Colors.py
16          184     Bio/Graphics/GenomeDiagram/_FeatureSet.py
8           172     Bio/Graphics/GenomeDiagram/_GraphSet.py
26          302     Bio/Graphics/GenomeDiagram/_Track.py
29          217     Bio/KDTree/KDTree.py
57          1761    Bio/Nexus/Nexus.py
6           61      Bio/PDB/Dice.py
15          384     Bio/PDB/DSSP.py
10          276     Bio/PDB/FragmentMapper.py
26          301     Bio/PDB/HSExposure.py
25          91      Bio/PDB/MMCIF2Dict.py
13          377     Bio/PDB/MMCIFParser.py
9           163     Bio/PDB/NACCESS.py
12          118     Bio/PDB/NeighborSearch.py
12          244     Bio/PDB/parse_pdb_header.py
16          202     Bio/PDB/PDBIO.py
44          303     Bio/PDB/PDBList.py
19          280     Bio/PDB/PDBParser.py
24          408     Bio/PDB/Polypeptide.py
8           87      Bio/PDB/PSEA.py
9           138     Bio/PDB/ResidueDepth.py
27          117     Bio/PDB/StructureAlignment.py
17          66      Bio/PDB/Superimposer.py
55          319     Bio/PDB/Vector.py
42          156     Bio/PDB/QCPSuperimposer/__init__.py
11          372     Bio/UniProt/GOA.py

vincentdavis avatar Oct 07 '16 01:10 vincentdavis

KGML_parser, Maxentropy, SVDSuperimposer and UniProt/GOA done via #950, with a start on Nexus in #957.

peterjc avatar Oct 10 '16 11:10 peterjc

Heya, I'm giving Bio/AlignIO/FastaIO.py a go :smile:

amblina avatar Oct 30 '16 17:10 amblina

@amblina - great - when you're ready, open a pull request and mention #820 please :)

Note we're asking all new contributors to agree to using both the historic Biopython User Agreement AND the 3-clause BSD license, see #898

peterjc avatar Oct 31 '16 11:10 peterjc

Hi, I'm having a look at Bio/Blast/NCBIXML.py.

Ideally the class should be tested with every kind of blast XML output found in Test/Blast? Ex: blastn, blastp, etc?

Gasta88 avatar Feb 13 '17 21:02 Gasta88

@Gasta88 that's an interesting one, right now https://github.com/biopython/biopython/blob/master/Bio/Blast/NCBIXML.py#L646 (link points at current latest code) when run directly takes a filename via the command line, and tries parsing it. I think we/you could just delete that if __name__ == '__main__': block.

In terms of test coverage, other than this NCBIXML.py looks good - eg https://codecov.io/gh/biopython/biopython/src/06b6cd64d42662ef208e86be713e5c2b5865c5ca/Bio/Blast/NCBIXML.py shows just a couple of error conditions and special cases are not being tested.

If you wanted to look at that, adding tests to Tests/test_NCBIXML.py would be idea.

peterjc avatar Feb 13 '17 22:02 peterjc

Hello, I already wrote an email in the Biopythondev mailing list. I would like to write some unit tests and was told to have a look at issue #820. What would be a good Biopython module to start with?

MaxGreil avatar Mar 14 '17 17:03 MaxGreil

Current line counts from the script posted earlier,

  • lines after __main__ if statement
  • total number of lines
  • filename
26	732	Bio/bgzf.py
246	575	Bio/AlignIO/FastaIO.py
358	1851	Bio/GenBank/Scanner.py
16	184	Bio/Graphics/GenomeDiagram/_FeatureSet.py
8	172	Bio/Graphics/GenomeDiagram/_GraphSet.py
26	302	Bio/Graphics/GenomeDiagram/_Track.py
29	217	Bio/KDTree/KDTree.py
6	62	Bio/PDB/Dice.py
15	440	Bio/PDB/DSSP.py
10	276	Bio/PDB/FragmentMapper.py
26	301	Bio/PDB/HSExposure.py
25	91	Bio/PDB/MMCIF2Dict.py
13	384	Bio/PDB/MMCIFParser.py
9	163	Bio/PDB/NACCESS.py
12	118	Bio/PDB/NeighborSearch.py
12	244	Bio/PDB/parse_pdb_header.py
16	206	Bio/PDB/PDBIO.py
61	416	Bio/PDB/PDBList.py
19	280	Bio/PDB/PDBParser.py
24	408	Bio/PDB/Polypeptide.py
8	87	Bio/PDB/PSEA.py
9	138	Bio/PDB/ResidueDepth.py
27	117	Bio/PDB/StructureAlignment.py
17	66	Bio/PDB/Superimposer.py
55	308	Bio/PDB/Vector.py
42	155	Bio/PDB/QCPSuperimposer/__init__.py

Of these Bio/PDB/QCPSuperimposer/__init__.py looks like a good one for @MaxGreil to try converting for addition to Tests/test_QCPSuperimposer.py as this is quite a recent addition to the code so was can easily check with the author if needed.

Note that Tests/test_QCPSuperimposer.py is an old fashioned print-and-compare test (see the chapter in our Tutorial for this, expected output in Tests/output/test_QCPSuperimposer.py), so a related task would be to convert that to use the unittest framework instead.

peterjc avatar Mar 14 '17 22:03 peterjc

I'm having problem with increasing coverage for Bio/KDTree.

The C module _CKDTree can't be found when loaded from Bio/KDTree/KDTree.py or Tests/test_KDTree.py.

Is the file missing altogether or have I forgot to pre-compile something?

Gasta88 avatar Mar 17 '17 14:03 Gasta88

@Gasta88 have you definitely run python setup.py build or python setup.py install before trying to use test_KDTree.py? The compiled C code will not be available otherwise.

peterjc avatar Mar 17 '17 16:03 peterjc

@peterjc I didn't before, but now that I've run the command the _CKDTree module is still missing.

Both tests in the Tests/test_KDTree.py are unable to create 'C KDTree' instances.

I can see that there is a C file named after the class, but I'm not sure how it is compiled.

Gasta88 avatar Mar 20 '17 21:03 Gasta88

@Gasta88 I suggest you open a separate issue to discuss this (giving details of the machine setup, OS, version of Python, etc). Thanks.

peterjc avatar Mar 20 '17 21:03 peterjc

@peterjc I referenced my Unit test for QCPSuperimposer. Is my code valid or should I change/modify it? If it is valid, with which biopython module should I continue?

MaxGreil avatar Mar 21 '17 16:03 MaxGreil

@MaxGreil Hi Max, there are two things I'd like done - neither is urgent.

First of all, this GitHub issue (#820) is about moving test code out of files like Bio/PDB/QCPSuperimposer/__init__.py into our main test suite, in this case Tests/test_QCPSuperimposer.py so that the tests are run automatically (and are counted in our unit test coverage reports). I'm talking about this bit of your code:

https://github.com/biopython/biopython/blob/biopython-168/Bio/PDB/QCPSuperimposer/init.py#L140

Secondly, and this is even less urgent, it would be nice to update Tests/test_QCPSuperimposer.py (and Tests/test_SVDSuperimposer.py) to use the unittest framework rather than the current print-and-compare style testing.

peterjc avatar Mar 21 '17 16:03 peterjc

If it is ok, I now have a look at Tests/test_SVDSuperimposer.py.

MaxGreil avatar Mar 23 '17 19:03 MaxGreil

Beside my problem with Tests/test_KDTree.py, is there any other module that takes priority?

Gasta88 avatar Mar 23 '17 21:03 Gasta88

@Gasta88 how about looking at the self-test in Bio/PDB/Vector.py and checking if all the operations are covered already in Tests/test_PDB.py? If we'll lucky then you can just remove the self-test from Bio/PDB/Vector.py.

peterjc avatar Mar 23 '17 23:03 peterjc

If the new unittests for test_SVDSuperimposer.py are acceptable, for which biopython module should I write some unittests next?

MaxGreil avatar Mar 26 '17 18:03 MaxGreil

@MaxGreil Thanks Max. Browsing https://codecov.io/gh/biopython/biopython one example in need of more test coverage is Bio/MarkovModel.py - it has some tests but according to the coverage report visualised as https://codecov.io/gh/biopython/biopython/src/48ae539449dded4ff1ff1e9f4ff7b8d0b7a28256/Bio/MarkovModel.py large parts are not being tested, e.g. function train_bw, or the load and save functions.

peterjc avatar Mar 26 '17 22:03 peterjc

Hello Peter, thanks for the suggestion. I have a look at the unittests for Bio/MarkovModel.py.

MaxGreil avatar Mar 27 '17 11:03 MaxGreil

Are there still biopython modules in need of more test coverage? On https://codecov.io/gh/biopython/biopython/tree/master/Bio Cluster/__init__.py has a coverage of 5.38%. Should I have a look at this next? If yes, what is the name of the corresponding test file?

MaxGreil avatar Apr 03 '17 20:04 MaxGreil

@MaxGreil you're asking important questions, but not relevant to this issue which was specifically about existing "self-tests" in the Bio/*.py files using the __main__ trick. In the case of Bio.Cluster, I think the fact that much of that is written in C may be confusing the coverage reports. In any case, have a look at test_Cluster.py.

peterjc avatar Apr 03 '17 21:04 peterjc