Improve test coverage by moving self-tests
Many of the Python files in Biopython still have self-tests at the end which are executed if the file is run directly via the idiom:
if __name__ == "__main__":
print("Running a quick self-test")
...
This is bad for several reasons - these tests are not being run routinely via our buildbot [Update: We stopped using buildbot, but now have AppVeyor instead] or TravisCI (and therefore if the test breaks, we don't know about it), they are not counted on the coverage report https://codecov.io/github/biopython/biopython/ (and in fact actively reduce the coverage score [Update: We configured codecov to ignore the self tests]), and moreover with Python 3 making subtle changes to imports, some of these self tests have accidentally become Python 2 only.
This is a tracking issue to encourage more commits like these which move these self-tests under Tests/test_XXX.py using the unittest framework instead:
- eb0c7ab903882587f3ece06e51e15c6e68855cc9
- da5a414819fdaef1fc9eb3a713fb00f92030889e
Or, in some cases using a doctest instead makes sense:
- 9242f12a65fb1d6bdc82f8201531662fb63a101b
In those cases, we're left with a stub like this which is useful when working directly on a file's doctests:
if __name__ == "__main__":
from Bio._utils import run_doctest
run_doctest(verbose=0)
Here's an indication of some obvious targets to tackle next:
$ python count_main.py Bio BioSQL
Walking Bio
26 726 Bio/bgzf.py
48 258 Bio/MaxEntropy.py
32 194 Bio/NaiveBayes.py
44 622 Bio/Align/AlignInfo.py
246 574 Bio/AlignIO/FastaIO.py
65 179 Bio/AlignIO/NexusIO.py
25 535 Bio/Blast/NCBIXML.py
358 1730 Bio/GenBank/Scanner.py
11 180 Bio/Graphics/GenomeDiagram/_Colors.py
16 184 Bio/Graphics/GenomeDiagram/_FeatureSet.py
8 172 Bio/Graphics/GenomeDiagram/_GraphSet.py
26 302 Bio/Graphics/GenomeDiagram/_Track.py
29 217 Bio/KDTree/KDTree.py
29 161 Bio/KEGG/KGML/KGML_parser.py
57 1760 Bio/Nexus/Nexus.py
6 61 Bio/PDB/Dice.py
15 344 Bio/PDB/DSSP.py
10 276 Bio/PDB/FragmentMapper.py
26 301 Bio/PDB/HSExposure.py
25 91 Bio/PDB/MMCIF2Dict.py
13 196 Bio/PDB/MMCIFParser.py
9 163 Bio/PDB/NACCESS.py
12 118 Bio/PDB/NeighborSearch.py
12 245 Bio/PDB/parse_pdb_header.py
16 202 Bio/PDB/PDBIO.py
44 303 Bio/PDB/PDBList.py
19 280 Bio/PDB/PDBParser.py
24 408 Bio/PDB/Polypeptide.py
8 87 Bio/PDB/PSEA.py
9 138 Bio/PDB/ResidueDepth.py
27 117 Bio/PDB/StructureAlignment.py
17 66 Bio/PDB/Superimposer.py
55 319 Bio/PDB/Vector.py
42 160 Bio/PDB/QCPSuperimposer/__init__.py
15 329 Bio/SeqIO/SeqXmlIO.py
186 1366 Bio/SeqIO/SffIO.py
18 146 Bio/SeqIO/SwissIO.py
13 113 Bio/SeqUtils/CheckSum.py
30 136 Bio/SVDSuperimposer/__init__.py
16 536 Bio/SwissProt/__init__.py
11 351 Bio/UniProt/GOA.py
Walking BioSQL
Done
This was a quick script to count the number of (non-blank) lines after the __main__ trick (and the total number of non-blank lines) per file, reporting any over 5:
import sys
import os
def count(filename):
total = 0
in_main = False
main = 0
with open(filename) as in_handle:
for line in in_handle:
if not line.strip():
continue
total += 1
if line.startswith("if ") and "__main__" in line:
in_main = True
if in_main:
main += 1
if main > 5:
print("%i\t%i\t%s" % (main, total, filename))
return main, total
for path in sys.argv[1:]:
print("Walking %s" % path)
for (dirpath, dirnames, filenames) in os.walk(path):
for f in filenames:
if f.endswith(".py"):
count(os.path.join(dirpath, f))
print("Done")
@peterjc you might want to label this easy.
Good point Vincent - it will vary case by case, but a lot of these should be fairly easy jobs - and the task is well defined.
For review: Considering coverage of SffIO docstests (56%) VS test_SffIO (57%) VS selftest(73%). Attached are coverage reports for each. Coverage_SffIO_docs.pdf Coverage_SffIO___main__.pdf Coverage_SffIO nostest.pdf
Those figures are certainly strong motivation to move the SFF self-tests into our unit-test framework. Looking back it was a mistake to write the tests that way in the first place, but I've grown to appreciated automated testing more and more over the years.
@peterjc I would like to work on the issue
Hi @souravsingh - how about working on Bio/AlignIO/NexusIO.py moving the self-test into Tests/test_Nexus.py? Are you clear about the task, and how to submit a pull request? Thanks!
#948 from @vincentdavis did this for Bio/SeqIO/SffIO.py
Updated list. (excludes not yet KGML_parser, Maxentropy, SVDSuperimposer/init.py fix)
__main__ Total File
26 732 Bio/bgzf.py
246 574 Bio/AlignIO/FastaIO.py
65 179 Bio/AlignIO/NexusIO.py
25 535 Bio/Blast/NCBIXML.py
358 1746 Bio/GenBank/Scanner.py
11 180 Bio/Graphics/GenomeDiagram/_Colors.py
16 184 Bio/Graphics/GenomeDiagram/_FeatureSet.py
8 172 Bio/Graphics/GenomeDiagram/_GraphSet.py
26 302 Bio/Graphics/GenomeDiagram/_Track.py
29 217 Bio/KDTree/KDTree.py
57 1761 Bio/Nexus/Nexus.py
6 61 Bio/PDB/Dice.py
15 384 Bio/PDB/DSSP.py
10 276 Bio/PDB/FragmentMapper.py
26 301 Bio/PDB/HSExposure.py
25 91 Bio/PDB/MMCIF2Dict.py
13 377 Bio/PDB/MMCIFParser.py
9 163 Bio/PDB/NACCESS.py
12 118 Bio/PDB/NeighborSearch.py
12 244 Bio/PDB/parse_pdb_header.py
16 202 Bio/PDB/PDBIO.py
44 303 Bio/PDB/PDBList.py
19 280 Bio/PDB/PDBParser.py
24 408 Bio/PDB/Polypeptide.py
8 87 Bio/PDB/PSEA.py
9 138 Bio/PDB/ResidueDepth.py
27 117 Bio/PDB/StructureAlignment.py
17 66 Bio/PDB/Superimposer.py
55 319 Bio/PDB/Vector.py
42 156 Bio/PDB/QCPSuperimposer/__init__.py
11 372 Bio/UniProt/GOA.py
KGML_parser, Maxentropy, SVDSuperimposer and UniProt/GOA done via #950, with a start on Nexus in #957.
Heya, I'm giving Bio/AlignIO/FastaIO.py a go :smile:
@amblina - great - when you're ready, open a pull request and mention #820 please :)
Note we're asking all new contributors to agree to using both the historic Biopython User Agreement AND the 3-clause BSD license, see #898
Hi, I'm having a look at Bio/Blast/NCBIXML.py.
Ideally the class should be tested with every kind of blast XML output found in Test/Blast? Ex: blastn, blastp, etc?
@Gasta88 that's an interesting one, right now https://github.com/biopython/biopython/blob/master/Bio/Blast/NCBIXML.py#L646 (link points at current latest code) when run directly takes a filename via the command line, and tries parsing it. I think we/you could just delete that if __name__ == '__main__': block.
In terms of test coverage, other than this NCBIXML.py looks good - eg https://codecov.io/gh/biopython/biopython/src/06b6cd64d42662ef208e86be713e5c2b5865c5ca/Bio/Blast/NCBIXML.py shows just a couple of error conditions and special cases are not being tested.
If you wanted to look at that, adding tests to Tests/test_NCBIXML.py would be idea.
Hello, I already wrote an email in the Biopythondev mailing list. I would like to write some unit tests and was told to have a look at issue #820. What would be a good Biopython module to start with?
Current line counts from the script posted earlier,
- lines after
__main__if statement - total number of lines
- filename
26 732 Bio/bgzf.py
246 575 Bio/AlignIO/FastaIO.py
358 1851 Bio/GenBank/Scanner.py
16 184 Bio/Graphics/GenomeDiagram/_FeatureSet.py
8 172 Bio/Graphics/GenomeDiagram/_GraphSet.py
26 302 Bio/Graphics/GenomeDiagram/_Track.py
29 217 Bio/KDTree/KDTree.py
6 62 Bio/PDB/Dice.py
15 440 Bio/PDB/DSSP.py
10 276 Bio/PDB/FragmentMapper.py
26 301 Bio/PDB/HSExposure.py
25 91 Bio/PDB/MMCIF2Dict.py
13 384 Bio/PDB/MMCIFParser.py
9 163 Bio/PDB/NACCESS.py
12 118 Bio/PDB/NeighborSearch.py
12 244 Bio/PDB/parse_pdb_header.py
16 206 Bio/PDB/PDBIO.py
61 416 Bio/PDB/PDBList.py
19 280 Bio/PDB/PDBParser.py
24 408 Bio/PDB/Polypeptide.py
8 87 Bio/PDB/PSEA.py
9 138 Bio/PDB/ResidueDepth.py
27 117 Bio/PDB/StructureAlignment.py
17 66 Bio/PDB/Superimposer.py
55 308 Bio/PDB/Vector.py
42 155 Bio/PDB/QCPSuperimposer/__init__.py
Of these Bio/PDB/QCPSuperimposer/__init__.py looks like a good one for @MaxGreil to try converting for addition to Tests/test_QCPSuperimposer.py as this is quite a recent addition to the code so was can easily check with the author if needed.
Note that Tests/test_QCPSuperimposer.py is an old fashioned print-and-compare test (see the chapter in our Tutorial for this, expected output in Tests/output/test_QCPSuperimposer.py), so a related task would be to convert that to use the unittest framework instead.
I'm having problem with increasing coverage for Bio/KDTree.
The C module _CKDTree can't be found when loaded from Bio/KDTree/KDTree.py or Tests/test_KDTree.py.
Is the file missing altogether or have I forgot to pre-compile something?
@Gasta88 have you definitely run python setup.py build or python setup.py install before trying to use test_KDTree.py? The compiled C code will not be available otherwise.
@peterjc I didn't before, but now that I've run the command the _CKDTree module is still missing.
Both tests in the Tests/test_KDTree.py are unable to create 'C KDTree' instances.
I can see that there is a C file named after the class, but I'm not sure how it is compiled.
@Gasta88 I suggest you open a separate issue to discuss this (giving details of the machine setup, OS, version of Python, etc). Thanks.
@peterjc I referenced my Unit test for QCPSuperimposer. Is my code valid or should I change/modify it? If it is valid, with which biopython module should I continue?
@MaxGreil Hi Max, there are two things I'd like done - neither is urgent.
First of all, this GitHub issue (#820) is about moving test code out of files like Bio/PDB/QCPSuperimposer/__init__.py into our main test suite, in this case Tests/test_QCPSuperimposer.py so that the tests are run automatically (and are counted in our unit test coverage reports). I'm talking about this bit of your code:
https://github.com/biopython/biopython/blob/biopython-168/Bio/PDB/QCPSuperimposer/init.py#L140
Secondly, and this is even less urgent, it would be nice to update Tests/test_QCPSuperimposer.py (and Tests/test_SVDSuperimposer.py) to use the unittest framework rather than the current print-and-compare style testing.
If it is ok, I now have a look at Tests/test_SVDSuperimposer.py.
Beside my problem with Tests/test_KDTree.py, is there any other module that takes priority?
@Gasta88 how about looking at the self-test in Bio/PDB/Vector.py and checking if all the operations are covered already in Tests/test_PDB.py? If we'll lucky then you can just remove the self-test from Bio/PDB/Vector.py.
If the new unittests for test_SVDSuperimposer.py are acceptable, for which biopython module should I write some unittests next?
@MaxGreil Thanks Max. Browsing https://codecov.io/gh/biopython/biopython one example in need of more test coverage is Bio/MarkovModel.py - it has some tests but according to the coverage report visualised as https://codecov.io/gh/biopython/biopython/src/48ae539449dded4ff1ff1e9f4ff7b8d0b7a28256/Bio/MarkovModel.py large parts are not being tested, e.g. function train_bw, or the load and save functions.
Hello Peter, thanks for the suggestion. I have a look at the unittests for Bio/MarkovModel.py.
Are there still biopython modules in need of more test coverage? On https://codecov.io/gh/biopython/biopython/tree/master/Bio Cluster/__init__.py has a coverage of 5.38%. Should I have a look at this next? If yes, what is the name of the corresponding test file?
@MaxGreil you're asking important questions, but not relevant to this issue which was specifically about existing "self-tests" in the Bio/*.py files using the __main__ trick. In the case of Bio.Cluster, I think the fact that much of that is written in C may be confusing the coverage reports. In any case, have a look at test_Cluster.py.