pyfastx icon indicating copy to clipboard operation
pyfastx copied to clipboard

Changes behavior of sequence objects so that they don't change their …

Open sclamons opened this issue 2 years ago • 0 comments

Primary change This fix changes the behavior of Sequence objects created inside for loops over Fasta objects, so that they don't unexpectedly change when used repeatedly.

EXAMPLE:

  1. Create a Sequence object inside a loop with genome = pyfastx.Fasta('my_genome_file.fa') for c in genome: ...: seq = c ...: break
  2. Examine the new Sequence object twice: print(seq.seq) print(seq.seq)

OLD BEHAVIOR The Sequence object thinks it's still in a loop environment and pulls sequences from genome, causing seq.seq to return different values each time it's accessed. Worse, the new sequence values aren't correctly accessed, causing potentially strange outputs. An actual working example:

`In [77]: seq.seq Out[77]: 'TCCAGATTACTTCCATTTCCGCCCAAGCTGCTCACAGTATACGGGCGTCGGCATCCAGACCGTCGGCTGATCGTGGTTTTACTAGGCTAGACTAGCGTACGAGCACTATGGTCAGTAATTCCTGGAGGAATAGGTACCAAGAAAAAAACGAACCTTTGGGTTCCAGAGCTGTACGGTCGCACTGAACTCGGATAGGTCTCAGAAAAACGAAATATAGGCTTACGGTAGGTCCGAATGGCACAAAGCTTGTTCCGTTAGCTGGCATAAGATTCCATGCCTAGATGTGATACACGTTTCTGGAAACTGCCTCGTCATGCGACTGTTCCCCGGGGTCAGGGCCGCTGGTATTTGCTGTAAAGAGGGGCGTTGAGTCCGTCCGACTTCACTGCCCCCTTTCAGCCTTTTGGGTCCTGTATCCCAATTCTCAGAGGTCCCGCCGTACGCTGAGGACCACCTGAAACGGGCATCGTCGCTCTTCGTTGTTCGTCGACTTCTAGTGTGGAGACGAATTGCCAGAATTATTAACTGCGCAGTTAGGGCAGCGTCTGAGGAAGTTTGCTGCGGTTTCGCCTTGACCGCGGGAAGGAGACATAACGATAGCGACTCTGTCTCAGGGGATCTGCATATGTTTGCAGCATACTTTAGGTGGGCCTTGGCTTCCTTCCGCAGTCAAAACCGCGCAATTATCCCCGTCCTGATTTACTGGACTCGCAACGTGGGTCCATCAGTTGTCCGTATACCAAGACGTCTAAGGGCGGTGTACACCCTTTTGAGCAATGATTGCACAACCTGCGATCACCTTATACAGAATTATCAATCAAGCTCCCCGAGGAGCGGACTTGTAAGGACCGCCGCTTTCGCTCGGGTCTGCGGGTTATAGCTTTTCAGTCTCGACGGGCTAGCACACATCTGGTTGACTAGGCGCATAGTCGCCATTCACAGATTTGCTCGGCAATCAGTACTGGTAGGCGTTAGACCCCGTGACTCGTGGCTGAACGGCCGTACAACTCGACAGCCGGTGCTTGCGTTTTACCCTTAAAAAAAAAAAAAAAAAAAAAAAA'

In [78]: seq.seq Out[78]: 'CAGCAGCGATTAAGGCAGAGGCGTTTGTATCTGCCATTATAAAGAAGTTTCCTCCAGCAACTCCTTTCTTAATTCCAAACTTAGCTTCAGTTATAAATTCCCCTCCCATGATTGGGATTTTATAAACTTTTCTTCCATATAATTCATCTTTCTTCTCATAACCGTCTCCGAAAAACTTCAACTTAAATCCAACCTTTAACTGCTCATCAGCCATGTCTCCCACAGCATCAAAAATAGCAGTTGTTGGACATGTTAAGACACACTGCCCCAATCTCTCTAACATTTGATGCTCTAACTCTGACTTTTTAGGGTGGCATATCTGTATTATAAATCCTGGTCTTCCATCTGGTGTTTTTGATGGAGGGACATATTTCTCAATTCCTGCTTCTGCTGGACACATTATAACTGAACAACCAAAACCTGTTGCCTCTGTAGCTGCAATCTTAGCCCACTTCTTTGTAGCTGCTGTTATTAAAACTCTTGAAACCCATATTGGGAATGCTTCTGCAAATGTATCTTCAATATATACTCCATTTATTTCCATAGTTTCCCTCCATTAAGATTTTAACAATTATAGTTTATCTTAGGGGCTATTAATATCTTATCATTTGGTTTTTAATATTCGATAAATCCATAAATAAAAATATATCAACAATAATTTTAAATAATCTAAGTATAGGTAATATAACAATTAAAAAGATTTAGAGGGATAGAATTGAACGGCATTAGGAGAATTGTTTTAGATATATTGAAGCCGCATGAGCCAAAAATAACAGATATGGCATTAAAATTAACATCATTATCAAACATTGATGGGGTTAATATTACAGTCTATGAAATAGATAAAGAGACTGAGAATGTTAAAGTTACAATTGAAGGGAATAATTTAGATTTTGATGAGATTCAGGAAATTATTGAAAGTTTGGGAGGGACTATTCACAGTATAGATGAGGTTGTTGCAGGTAAAAAGATTATTGAAGAGTTAGAACACCACAAGATAAAAAAAAAAAAAAAAAAAAAAAA>ERCC-00004TCTTGCTTCAACAATAACGTCTCTTTC' `

NEW BEHAVIOR: Whenever a Sequence object's seq attribute is accessed, it will return the same string.

Concerns This patch definitely needs a review from someone who understand the code thoroughly! This code runs and passes all tests (including two new ones added to check for the old behavior), but I don't know if there are other use-cases where users would expect a sequence object to keep iterating outside a loop.

Other Minor Changes

  1. Added tests to check for the old behavior.
  2. Fixed a test that was failing because pyfaidx.Fasta objects can't take negative indices.
  3. Edited .gitignore file to automatically ignore everything in the build/ and tests/pycache/ folders.

sclamons avatar Jun 27 '22 18:06 sclamons