atacseq
atacseq copied to clipboard
Make the presence of a GTF file optional
Hi everyone,
would it possible to make the specification of a GTF file optional? I am often working with non-model systems that do have an assembled genome but no gene annotation available. Without a GTF file, I cannot use the present pipeline here at all. However, many of the steps in the pipeline could still work without gene annotation and only the steps that absolutely require it should be skipped. Unfortunately I lack the coding knowledge to make the GTF file optional in this pipeline, but maybe someone of you could consider doing so.
Thank you very much for your time! Temperche
Hi @Temperche . That sounds like a sensible proposition. It may not be too difficult to add this but I suspect it will take a bit of testing to make sure it is implemented properly. Ill see if I can squeeze it into the next release (imminent) but if not definitely in the one after.
Thank you. As I require this pipeline for analysis of preliminary data to apply for a research grant (Deadline in April), I would be glad if it could be added sooner rather than later.
You could in theory create a dummy gtf
file with several intervals and give that to the pipeline? Havent tried it before but the worse case scenario is that you just dont use the annotated files for anything useful.
Hi @Temperche so looking at this in a bit more detail it appears that its going to take quite a bit of work to fully implement and test making the gtf
file optional. As it stands, the pipeline is either using the gtf
directly for annotating peaks or is using it to create bed
files for gene features/tss intervals in order to supply those to other processes. Im afraid I wont have the time to add this in right now but definitely could give it a go in the future.
Alternatively, as I suggested in the comment above you could create a fake gtf
file for your non-model reference and then provide that to the pipeline. It doesnt matter what the intervals in the gtf
file are as long as its in a valid format.
Hi @Temperche ! Apologies for the delay in responding! We are about to release a much updated version of the pipeline that has been completely refactored to be written in Nextflow DSL2. This will now also support GFF files as input as well as GTF.
You should be able to skip most of the steps requiring an annotation with --skip_peak_annotation
so might be worth trying that. You may need to specify a dummy GTF as I mentioned in the above comment but I don't think it will actually be used.
Please re-open if you observe any other problems related to this issue.