works with sequences sharing same ids
a need that pop-up today from one of my colleague.
Just looking at the code I don't understand the goal here - do you have a simplified example with the before/after behaviour?
Of course.
I had updated the tests accordingly.
before this PR, with this kind of file:
>7
ATGC
>7
ATGC
the result:
>7;7 representing 2 records
ATGC
>7;7 representing 2 records
ATGC
now the result file is:
>7;7 representing 2 records
ATGC
We often have this case, as we fetch sequences from various sources or people and try to merge them.
Is this clear for you?
The failed tests are not related to this PR, right?
This is input query files with repeated identifiers?
Repeated entries with the same identifier and sequence are one thing, repeated identifiers with different sequence are another. Personally I would make these an error condition - they cause too many problems downstream.
The tool's master branch is failing on TravisCI against the Galaxy dev branch, see #120
you're right, the output is not yet perfect in this case:
>1
A
>1
A
>1
T
the output was:
>1;1 representing 2 records
A
>1;1 representing 2 records
A
>1;1 representing 2 records
T
with this PR it's now (and not good either...):
>1;1 representing 2 records
A
@peterjc This new version works for what I was needed. I let you review the code and merge if you want. Thank you for your comments.