biopython
biopython copied to clipboard
bugfix for writing genpept with long ids.
-
[X] I hereby agree to dual licence this and any previous contributions under both the Biopython License Agreement AND the BSD 3-Clause License.
-
[X] I have read the
CONTRIBUTING.rstfile, have runpre-commitlocally, and understand that AppVeyor and TravisCI will be used to confirm the Biopython unit tests and style checks pass with these changes. -
[X] I have added my name to the alphabetical contributors listings in the files
NEWS.rstandCONTRIB.rstas part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)
Closes #3732
One line change as suggested in the issue post, plus a test.
Codecov Report
Base: 0.00% // Head: 82.07% // Increases project coverage by +82.07% :tada:
Coverage data is based on head (
5835d9b) compared to base (6f513f6). Patch has no changes to coverable lines.
:exclamation: Current head 5835d9b differs from pull request most recent head d59e80c. Consider uploading reports for the commit d59e80c to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #3733 +/- ##
===========================================
+ Coverage 0 82.07% +82.07%
===========================================
Files 0 294 +294
Lines 0 49392 +49392
===========================================
+ Hits 0 40539 +40539
- Misses 0 8853 +8853
| Impacted Files | Coverage Δ | |
|---|---|---|
| Bio/SeqIO/InsdcIO.py | 94.55% <ø> (ø) |
|
| Bio/SCOP/Cla.py | 96.61% <0.00%> (ø) |
|
| Bio/motifs/xms.py | 76.27% <0.00%> (ø) |
|
| Bio/Phylo/CDAOIO.py | 4.46% <0.00%> (ø) |
|
| Bio/Sequencing/Applications/_Novoalign.py | 100.00% <0.00%> (ø) |
|
| Bio/PDB/PDBIO.py | 83.44% <0.00%> (ø) |
|
| Bio/SearchIO/_legacy/__init__.py | 100.00% <0.00%> (ø) |
|
| Bio/NMR/xpktools.py | 66.08% <0.00%> (ø) |
|
| Bio/ExPASy/ScanProsite.py | 30.15% <0.00%> (ø) |
|
| Bio/SwissProt/__init__.py | 89.95% <0.00%> (ø) |
|
| ... and 285 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Why are you removing all those warnings anyway?
Why are you removing all those warnings anyway?
That's a good question. Actually I didn't mean those last few commits to be part of the pull request (didn't realize, or maybe forgot, that Github updates pull requests with later commits to the same branch, I should have made a new branch for those).
But to answer your question, I was trying to optimize the speed of the genbank parser, so I did profiling and found that those lines cause major performance hits and check for very rare conditions.
I'll try to revert the branch back to the commit that I intended to include.
Yes, updating the branch updates the pull request too. Very useful once you know to expect this.
I wondered if the de-quoting change from regex to plain Python would be good or bad from a performance point of view. Avoiding the regex would be easier to read.
avoiding the regexes is helpful for speed. The changes in that commit are all the result of profiling I did. All together the changes almost halve the parsing time (9 seconds to 5 seconds for the example I was profiling on).
It has been years since I last profiled the GenBank parser, I guess we've built up some additional checks which come at a speed cost. Hopefully we can speed them up now you've flagged some hotspots.
I made a commit reverting changes irrelevant for this pull request (except the one changing the quotes regex to checking the first and last characters). Sorry for the confusion.
Note you've accidentally deleted the symlink LICENSE here.
Feel free to clean up the branch history (e.g. git rebase into a single commit), but if not we can do a squash-and-merge of this pull request instead.
Ok, I tried. The history looks even more complicated now, so not sure if I did it right.
The history looks cleaner now (no merges). If you're not familiar with git-rebase, it isn't immediately obvious how to use it to combine commits and re-write the commit messages. Don't worry about it.
We can squash-and-merge with a commit comment about speeding up the quote removal, and addressing the false positive warning for #3732.
Merged, thank you!