seqan3
seqan3 copied to clipboard
Change how seqan3::field::alignment works with SAM/BAM files.
Currently, seqan3::field::alignment is a default field of SAM/BAM files. However, it requires a reference sequence in addition to the SAM/BAM input (if I'm understanding correctly). For the following case, this results in an error:
template <typename traits_type, typename fields_type, typename format_type>
inline auto do_something(seqan3::sam_file_input<traits_type, fields_type, format_type> & input)
{
...
auto results_list = input | std::views::take_while([file_position](auto & rec) {return file_position != -1;})
| std::views::take_while([end](auto & rec) {return std::make_tuple(rec.reference_id().value(), rec.reference_position().value()) < end;})
| std::views::filter([](auto & rec) {return !unmapped(rec);})
| seqan3::views::to<std::vector>;
seqan3::debug_stream << "Results: " << results_list << '\n';
return results_list;
}
There is an error when passing an input file of type seqan3::sam_file_input with default fields.
The issue is that the default fields includes the seqan3::field::alignment field. So the debug_stream will try and print out the alignment for the records. However, there is no alignment actually stored.
Solution: Remove the seqan3::field::alignment field from the default fields for sam_file_input/output types.
Additionally, it would be nice if there were a way to enforce that seqan3::field::alignment can only be provided alongside a reference sequence file. I don't think I see anything in the documentation about how that actually works. Or a "default" empty alignment should be stored, to not cause an error when accessing an empty alignment.
Minimal example:
seqan3::sam_file_input input_file{input_path};
for (auto & record : input_file)
{
seqan3::debug_stream << record << '\n';
}
Will cause an error.
This issue will be automatically solved once we remove the seqan3::field::alignment from the sam_file.
See related PRs:
- #3057 (Adds a function
seqan3::alignment_from_cigarthat lets you create the alignment after reading the sam file) - #3058 (Removes the field
seqan3::field::alignmentfrom theseqan3::sam_file_input/output
Minimal example:
seqan3::sam_file_input input_file{input_path}; for (auto & record : input_file) { seqan3::debug_stream << record << '\n'; }Will cause an error.
is there a solve for this? I am trying to read a bam file and exctract its cigar information using this function, is there an alternative way to do it for now?
this is returning the following error in my case:
terminate called after throwing an instance of 'seqan3::file_open_error'
what(): Trying to read from a bgzf file, but no ZLIB available.
Aborted (core dumped)
this is my script, which is identical to the one mentioned in the documentation provided by the authors:
#include <filesystem>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/sam_file/all.hpp>
int main()
{
auto filename = std::filesystem::current_path() / "mybam_file.bam";
std::cout << filename << std::endl;
seqan3::sam_file_input fin{filename}; // default fields
for (auto & record : fin){
seqan3::debug_stream << record.cigar_sequence() << '\n'; // access cigar vector
}
}
the bamfile is tested to be working
Hi @Lionward,
your issue seems to something else. The error says that you are trying to read BAM, but there is no ZLIB available. Check that Zlib is installed on the system your have build seqan3.
- If not, install it and rebuild the seqan3 script from scratch.
- If it is, please provide more information on which seqan3 version and compiler your are using and how your build steps are.
Best, Svenja