seqan3
seqan3 copied to clipboard
[Search] Shapes are the wrong way around
I think the kmer hash view does not work with views, which have shapes in the form 1101 or 1011.
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/search/views/kmer_hash.hpp>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4> const text{"ACGTC"_dna4};
auto const shape1011 = seqan3::shape{seqan3::bin_literal{0b1011}};
auto const shape1101 = seqan3::shape{seqan3::bin_literal{0b1101}};
seqan3::debug_stream << "shape1011\nExpected: [11,29]\nActual : "
<< (text | seqan3::views::kmer_hash(shape1011)) << "\n\n";
seqan3::debug_stream << "shape1101\nExpected: [7,25]\nActual : "
<< (text | seqan3::views::kmer_hash(shape1101)) << '\n';
}
shape1011
Expected: [11,29]
Actual : [7,25]
shape1101
Expected: [7,25]
Actual : [11,29]
My assumption is that the shift does not work correctly with these shapes, the only shapes we tested so far are shapes of the sort 1(any number of zeros)1.
The values seem correct to me?
1011
ACGTC
1011
------------------------------
16 4 1 16 4 1
A G T C T C
0 2 3 1 3 1
------------------------------
0 + 8 + 3 = 11 16 + 12 + 1 = 29
1101
ACGTC
1101
------------------------------
16 4 1 16 4 1
A C T C G T
0 1 3 1 2 1
------------------------------
0 + 4 + 3 = 7 16 + 8 + 1 = 25
@eseiler
Maybe, I misunderstand the shape, isn't 0b1101 1101 (so your second example) and we get the result of the first and vice visa?
Ok, I get it now that the snippet is correct :)
The problem is that this inherits from the dynamic_bitset (which is just a constexpr-capable version of std::bitset, and implements the same API).
This means that
seqan3::dynamic_bitset t1{0b1011'1000'1111};
seqan3::debug_stream << t1 << '\n'; // 1011'1000'1111
seqan3::debug_stream << t1[1] << '\n'; // 1
I.e., it prints left to right, but the indices go right to left (unless this is the actual bug, would need to check std::bitset).
So, we could just call this->flip() in the shape-ctor https://github.com/seqan/seqan3/blob/master/include/seqan3/search/kmer_index/shape.hpp#L105
but then the shape would be printed the wrong way around, except when we overload.
Sorry, for the wrong snippet, I got myself confused. :sweat_smile:
Maybe, we mention this in the documentation?
I reopen this because that shapes are "the wrong way around" might be a problem :)
Maybe just adding some documentation would be enough? #2981
Maybe just adding some documentation would be enough? #2981
Yes, at the very least 👍 . We can talk in the meeting about it. Either we find that this is enough, or we want some special behaviour that makes it behave like you would expect :D
Edit:
How do shapes work in SeqAn2?
Might be of interest when discussing this https://github.com/seqan/product_backlog/issues/388