atacseq icon indicating copy to clipboard operation
atacseq copied to clipboard

Make the presence of a GTF file optional

Open Temperche opened this issue 4 years ago • 4 comments

Hi everyone,

would it possible to make the specification of a GTF file optional? I am often working with non-model systems that do have an assembled genome but no gene annotation available. Without a GTF file, I cannot use the present pipeline here at all. However, many of the steps in the pipeline could still work without gene annotation and only the steps that absolutely require it should be skipped. Unfortunately I lack the coding knowledge to make the GTF file optional in this pipeline, but maybe someone of you could consider doing so.

Thank you very much for your time! Temperche

Temperche avatar Feb 28 '20 14:02 Temperche

Hi @Temperche . That sounds like a sensible proposition. It may not be too difficult to add this but I suspect it will take a bit of testing to make sure it is implemented properly. Ill see if I can squeeze it into the next release (imminent) but if not definitely in the one after.

drpatelh avatar Feb 28 '20 14:02 drpatelh

Thank you. As I require this pipeline for analysis of preliminary data to apply for a research grant (Deadline in April), I would be glad if it could be added sooner rather than later.

Temperche avatar Feb 28 '20 15:02 Temperche

You could in theory create a dummy gtf file with several intervals and give that to the pipeline? Havent tried it before but the worse case scenario is that you just dont use the annotated files for anything useful.

drpatelh avatar Feb 28 '20 15:02 drpatelh

Hi @Temperche so looking at this in a bit more detail it appears that its going to take quite a bit of work to fully implement and test making the gtf file optional. As it stands, the pipeline is either using the gtf directly for annotating peaks or is using it to create bed files for gene features/tss intervals in order to supply those to other processes. Im afraid I wont have the time to add this in right now but definitely could give it a go in the future.

Alternatively, as I suggested in the comment above you could create a fake gtf file for your non-model reference and then provide that to the pipeline. It doesnt matter what the intervals in the gtf file are as long as its in a valid format.

drpatelh avatar Mar 01 '20 20:03 drpatelh

Hi @Temperche ! Apologies for the delay in responding! We are about to release a much updated version of the pipeline that has been completely refactored to be written in Nextflow DSL2. This will now also support GFF files as input as well as GTF.

You should be able to skip most of the steps requiring an annotation with --skip_peak_annotation so might be worth trying that. You may need to specify a dummy GTF as I mentioned in the above comment but I don't think it will actually be used.

Please re-open if you observe any other problems related to this issue.

drpatelh avatar Nov 18 '22 12:11 drpatelh