seqan3 icon indicating copy to clipboard operation
seqan3 copied to clipboard

[Search] Shapes are the wrong way around

Open MitraDarja opened this issue 4 years ago • 8 comments

I think the kmer hash view does not work with views, which have shapes in the form 1101 or 1011.

#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/search/views/kmer_hash.hpp>

int main()
{
    using namespace seqan3::literals;

    std::vector<seqan3::dna4> const text{"ACGTC"_dna4};
    auto const shape1011 = seqan3::shape{seqan3::bin_literal{0b1011}};
    auto const shape1101 = seqan3::shape{seqan3::bin_literal{0b1101}};

    seqan3::debug_stream << "shape1011\nExpected: [11,29]\nActual  : "
                         << (text | seqan3::views::kmer_hash(shape1011)) << "\n\n";

    seqan3::debug_stream << "shape1101\nExpected: [7,25]\nActual  : "
                         << (text | seqan3::views::kmer_hash(shape1101)) << '\n';
}
shape1011
Expected: [11,29]
Actual  : [7,25]

shape1101
Expected: [7,25]
Actual  : [11,29]

My assumption is that the shift does not work correctly with these shapes, the only shapes we tested so far are shapes of the sort 1(any number of zeros)1.

MitraDarja avatar Sep 24 '21 10:09 MitraDarja

The values seem correct to me?

1011
ACGTC
 1011

------------------------------

16 4  1           16 4  1 
A  G  T           C  T  C
0  2  3           1  3  1

------------------------------

0 + 8 + 3 = 11    16 + 12 + 1 = 29
1101
ACGTC
 1101

------------------------------

16 4  1           16 4  1 
A  C  T           C  G  T
0  1  3           1  2  1

------------------------------

0 + 4 + 3 = 7     16 + 8 + 1 = 25

eseiler avatar Sep 24 '21 11:09 eseiler

@eseiler

Maybe, I misunderstand the shape, isn't 0b1101 1101 (so your second example) and we get the result of the first and vice visa?

MitraDarja avatar Sep 24 '21 11:09 MitraDarja

Ok, I get it now that the snippet is correct :)

The problem is that this inherits from the dynamic_bitset (which is just a constexpr-capable version of std::bitset, and implements the same API).

This means that

seqan3::dynamic_bitset t1{0b1011'1000'1111};
seqan3::debug_stream << t1 << '\n'; // 1011'1000'1111
seqan3::debug_stream << t1[1] << '\n'; // 1

I.e., it prints left to right, but the indices go right to left (unless this is the actual bug, would need to check std::bitset).

So, we could just call this->flip() in the shape-ctor https://github.com/seqan/seqan3/blob/master/include/seqan3/search/kmer_index/shape.hpp#L105 but then the shape would be printed the wrong way around, except when we overload.

eseiler avatar Sep 24 '21 11:09 eseiler

Sorry, for the wrong snippet, I got myself confused. :sweat_smile:

Maybe, we mention this in the documentation?

MitraDarja avatar Sep 24 '21 12:09 MitraDarja

I reopen this because that shapes are "the wrong way around" might be a problem :)

eseiler avatar Sep 28 '21 06:09 eseiler

Maybe just adding some documentation would be enough? #2981

SGSSGene avatar May 16 '22 09:05 SGSSGene

Maybe just adding some documentation would be enough? #2981

Yes, at the very least 👍 . We can talk in the meeting about it. Either we find that this is enough, or we want some special behaviour that makes it behave like you would expect :D

Edit:

How do shapes work in SeqAn2?

eseiler avatar May 16 '22 09:05 eseiler

Might be of interest when discussing this https://github.com/seqan/product_backlog/issues/388

smehringer avatar Oct 20 '22 10:10 smehringer