Running sequence alignment with alphabets with more than 256 characters
Platform
- SeqAn version: 3.4.0
- Operating system: Linux
- Compiler: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Question
Hi, Is it possible to compute sequence alignments suing alphabets larger than 256 characters?
For instance I tried running one of the examples from the tutorials using an alphabet with more than 256 characters I defined like this:
class example_alphabet : public seqan3::alphabet_base<example_alphabet, 1333, char16_t>
{
.
.
.
};
and when building this code:
// Invoke the pairwise alignment which returns a lazy range over alignment results.
auto results_example = seqan3::align_pairwise(std::tie(example_alphabet_vector_1, example_alphabet_vector_2), config);
auto & res_example = *results_example.begin();
seqan3::debug_stream << "Score: " << res_example.score() << '\n';
return 0;
I get these kind of errors:
/home/eclypsium/Workspace/v2d/seqan3/tutorial/seqan3/include/seqan3/alphabet/composite/alphabet_variant.hpp:134:25: error: static assertion failed: The alphabet_variant is currently only tested for alphabets with char_type char. Contact us on GitHub if you have a different use case: https://github.com/seqan/seqan3 .
134 | static_assert((std::is_same_v<alphabet_char_t<alternative_types>, char> && ...),
|
Looking at the code it seems that char as is hardcoded in many places as char_type. Is there any way to circumvent this?
Best regards, Andrés Tiraboschi
Hey there,
We had a similar issue in #3271.
Since your alphabet type would fit in a char16_t, it should be possible to modify alphabet_variant to allow for this.
I will try it out tomorrow.
When I apply this patch
Click to show
diff --git a/include/seqan3/alphabet/composite/alphabet_variant.hpp b/include/seqan3/alphabet/composite/alphabet_variant.hpp
index 82b035a99..df411d921 100644
--- a/include/seqan3/alphabet/composite/alphabet_variant.hpp
+++ b/include/seqan3/alphabet/composite/alphabet_variant.hpp
@@ -121,18 +121,22 @@ template <typename... alternative_types>
requires (detail::writable_constexpr_alphabet<alternative_types> && ...) && (std::regular<alternative_types> && ...)
&& (sizeof...(alternative_types) >= 2)
class alphabet_variant :
- public alphabet_base<alphabet_variant<alternative_types...>,
- (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
- char>
+ public alphabet_base<
+ alphabet_variant<alternative_types...>,
+ (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
+ std::conditional_t<(std::same_as<alphabet_char_t<alternative_types>, char> && ...), char, char16_t>>
{
private:
//!\brief The base type.
- using base_t = alphabet_base<alphabet_variant<alternative_types...>,
- (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
- char>;
-
- static_assert((std::is_same_v<alphabet_char_t<alternative_types>, char> && ...),
- "The alphabet_variant is currently only tested for alphabets with char_type char. "
+ using base_t = alphabet_base<
+ alphabet_variant<alternative_types...>,
+ (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
+ std::conditional_t<(std::same_as<alphabet_char_t<alternative_types>, char> && ...), char, char16_t>>;
+
+ static_assert(((std::is_same_v<alphabet_char_t<alternative_types>, char>
+ || std::is_same_v<alphabet_char_t<alternative_types>, char16_t>)
+ && ...),
+ "The alphabet_variant is currently only tested for alphabets with char_type char or char16_t. "
"Contact us on GitHub if you have a different use case: https://github.com/seqan/seqan3 .");
//!\brief Befriend the base type.
It seems to work just fine
Click to show
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/alphabet_base.hpp>
#include <seqan3/core/debug_stream.hpp>
namespace example
{
class example_alphabet : public seqan3::alphabet_base<example_alphabet, 1333, char16_t>
{
using base_t = seqan3::alphabet_base<example_alphabet, 1333, char16_t>;
public:
using base_t::base_t;
static constexpr char16_t rank_to_char(rank_type const rank)
{
return static_cast<char16_t>(rank);
}
static constexpr rank_type char_to_rank(char16_t const chr)
{
return static_cast<rank_type>(chr);
}
};
inline namespace literals
{
constexpr example_alphabet operator""_example(char const c) noexcept
{
return example_alphabet{}.assign_char(c);
}
constexpr std::vector<example_alphabet> operator""_example(char const * const s, size_t const n)
{
std::vector<example_alphabet> r;
r.resize(n);
for (size_t i = 0; i < n; ++i)
r[i].assign_char(s[i]);
return r;
}
} // namespace literals
} // namespace example
int main()
{
using namespace example::literals;
std::vector<example::example_alphabet> seq1 = "ACGTGATG!!@@++"_example;
std::vector<example::example_alphabet> seq2 = "AGTGATACT!!@@++"_example;
seqan3::configuration cfg = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme;
auto results_example = seqan3::align_pairwise(std::tie(seq1, seq2), cfg);
auto & res_example = *results_example.begin();
seqan3::debug_stream << "Score: " << res_example.score() << '\n';
// char16_t cannot be printed directly, so we need to convert it to char.
auto adaptor = std::views::transform(
[](auto const & in)
{
auto letter = seqan3::to_char(in);
return static_cast<char>(letter);
});
auto && [p1, p2] = res_example.alignment();
seqan3::debug_stream << adaptor(p1) << '\n';
seqan3::debug_stream << adaptor(p2) << '\n';
// Score: -4
// ACGTGATG--!!@@++
// A-GTGATACT!!@@++
}
Seems like alphabet_variant is the only gatekeeper. All other parts are generic and use the rank/char type of the alphabet.
Cool! Thanks I'll give it a try