Invert Filtering
Hello,
I am hoping to use grenedalf, but have some questions about filtering. I generally mask indels (via indel_filtering/identify-indel-regions.pl, which produces a .gtf file, then indel_filtering/filter-sync-by-gtf.pl in Popoolation2). I see that it is recommended to upload non-filtered data in bam/sam (rather than .sync). Therefore:
- Is it possible to filter the data by the .gtf file in grenedalf? Is it possible to invert "--filter-region-gff", to only keep sites NOT in the reference file?
- If this isn't possible, I suppose the best option would be to perform filtering beforehand and upload the data as a sync file?
Thank you for your time!
Cheerio, MK
Hey MK,
thanks for using grenedalf!
- You can indeed filter input via
--filter-region-gff, but as you say, this will keep the sites in the GTF, not remove them. Unfortunately, there is currently no invert option for the region filters, but it is an obvious idea, and I'll add it as soon as I get to it! - Yes, that's one way. Or you somehow invert your GTF file itself. A quick search did not reveal any tools for that, but it might be easier, as then you can still work with your full files, without having to create filtered copies of your data.
Let's keep this issue open to remind me to implement this. I'm currently writing grants (from which other open issues here are unfortunately also suffering), but I plan to circle back to grenedalf eventually.
Cheers and let me know if you have any further questions! Lucas
Of course, if all you need from the GFT is the regions to remove, you can also convert this into one of the other file formats, and use those as your region filter. For instance:
- Convert to bed: https://gffutils.readthedocs.io/en/latest/gtf2bed.html
- Complement the bed: https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html#complement-c-complement
For the latter, it might work to simply provide the same file twice, and the result should be the complement. Not quite sure, but might be as easy as that. Then, use the bed file as your region filter.