disq icon indicating copy to clipboard operation
disq copied to clipboard

Support for FASTQ

Open magicDGS opened this issue 7 years ago • 5 comments

In the current Support Matrix there is no mention to the status (currently none) to FASTQ-files. Although HadoopBAM has it but it is not widely used, I would like to see support for it.

We can port the HadoopBAM version or we can also ask for other solutions like Fastdoop. I had previously contacted @umbfer about the availability of his library (before it got into GitHub) and also I mentioned him this project, so maybe he can also chime if he is interested to contribute to this project with a port of his code.

In any case, I think that the library might benefit form this support as FASTQ is still the most common file format for unmapped reads.

magicDGS avatar Sep 10 '18 19:09 magicDGS

ADAM has support for FASTQ files. My main goal with the Disq library is to reduce duplication of effort among all projects using Apache Spark for genomics.

heuermh avatar Sep 10 '18 20:09 heuermh

@heuermh - is it planned to port some functionality from ADAM?

magicDGS avatar Sep 10 '18 20:09 magicDGS

Yes, I am in favor of bringing things from downstream projects (e.g. GATK4, ADAM) up into Disq as necessary.

heuermh avatar Sep 10 '18 20:09 heuermh

Perfect! So in that case we keep open until it is ported (or decided not to)

magicDGS avatar Sep 10 '18 20:09 magicDGS

Hi everybody, we are focused on the development of a new version of Fastdoop and, unfortunately, we miss the time required for porting our code into your framework. But we could provide some form of support, if some you is willing to undertake this task.

umbfer avatar Sep 12 '18 09:09 umbfer