biopython icon indicating copy to clipboard operation
biopython copied to clipboard

bugfix for writing genpept with long ids.

Open seanrjohnson opened this issue 4 years ago • 1 comments

  • [X] I hereby agree to dual licence this and any previous contributions under both the Biopython License Agreement AND the BSD 3-Clause License.

  • [X] I have read the CONTRIBUTING.rst file, have run pre-commit locally, and understand that AppVeyor and TravisCI will be used to confirm the Biopython unit tests and style checks pass with these changes.

  • [X] I have added my name to the alphabetical contributors listings in the files NEWS.rst and CONTRIB.rst as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #3732

One line change as suggested in the issue post, plus a test.

seanrjohnson avatar Sep 16 '21 16:09 seanrjohnson

Codecov Report

Base: 0.00% // Head: 82.07% // Increases project coverage by +82.07% :tada:

Coverage data is based on head (5835d9b) compared to base (6f513f6). Patch has no changes to coverable lines.

:exclamation: Current head 5835d9b differs from pull request most recent head d59e80c. Consider uploading reports for the commit d59e80c to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3733       +/-   ##
===========================================
+ Coverage        0   82.07%   +82.07%     
===========================================
  Files           0      294      +294     
  Lines           0    49392    +49392     
===========================================
+ Hits            0    40539    +40539     
- Misses          0     8853     +8853     
Impacted Files Coverage Δ
Bio/SeqIO/InsdcIO.py 94.55% <ø> (ø)
Bio/SCOP/Cla.py 96.61% <0.00%> (ø)
Bio/motifs/xms.py 76.27% <0.00%> (ø)
Bio/Phylo/CDAOIO.py 4.46% <0.00%> (ø)
Bio/Sequencing/Applications/_Novoalign.py 100.00% <0.00%> (ø)
Bio/PDB/PDBIO.py 83.44% <0.00%> (ø)
Bio/SearchIO/_legacy/__init__.py 100.00% <0.00%> (ø)
Bio/NMR/xpktools.py 66.08% <0.00%> (ø)
Bio/ExPASy/ScanProsite.py 30.15% <0.00%> (ø)
Bio/SwissProt/__init__.py 89.95% <0.00%> (ø)
... and 285 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 16 '21 16:09 codecov[bot]

Why are you removing all those warnings anyway?

peterjc avatar Oct 18 '22 10:10 peterjc

Why are you removing all those warnings anyway?

That's a good question. Actually I didn't mean those last few commits to be part of the pull request (didn't realize, or maybe forgot, that Github updates pull requests with later commits to the same branch, I should have made a new branch for those).

But to answer your question, I was trying to optimize the speed of the genbank parser, so I did profiling and found that those lines cause major performance hits and check for very rare conditions.

I'll try to revert the branch back to the commit that I intended to include.

seanrjohnson avatar Oct 18 '22 13:10 seanrjohnson

Yes, updating the branch updates the pull request too. Very useful once you know to expect this.

I wondered if the de-quoting change from regex to plain Python would be good or bad from a performance point of view. Avoiding the regex would be easier to read.

peterjc avatar Oct 18 '22 13:10 peterjc

avoiding the regexes is helpful for speed. The changes in that commit are all the result of profiling I did. All together the changes almost halve the parsing time (9 seconds to 5 seconds for the example I was profiling on).

seanrjohnson avatar Oct 18 '22 13:10 seanrjohnson

It has been years since I last profiled the GenBank parser, I guess we've built up some additional checks which come at a speed cost. Hopefully we can speed them up now you've flagged some hotspots.

peterjc avatar Oct 18 '22 14:10 peterjc

I made a commit reverting changes irrelevant for this pull request (except the one changing the quotes regex to checking the first and last characters). Sorry for the confusion.

seanrjohnson avatar Oct 18 '22 14:10 seanrjohnson

Note you've accidentally deleted the symlink LICENSE here.

Feel free to clean up the branch history (e.g. git rebase into a single commit), but if not we can do a squash-and-merge of this pull request instead.

peterjc avatar Oct 18 '22 14:10 peterjc

Ok, I tried. The history looks even more complicated now, so not sure if I did it right.

seanrjohnson avatar Oct 18 '22 14:10 seanrjohnson

The history looks cleaner now (no merges). If you're not familiar with git-rebase, it isn't immediately obvious how to use it to combine commits and re-write the commit messages. Don't worry about it.

We can squash-and-merge with a commit comment about speeding up the quote removal, and addressing the false positive warning for #3732.

peterjc avatar Oct 18 '22 14:10 peterjc

Merged, thank you!

peterjc avatar Oct 21 '22 13:10 peterjc