Dark_and_Camouflaged_genes
Dark_and_Camouflaged_genes copied to clipboard
Problems while running 05_CREATE_BED_FILE
Hello, I tried to use your script to detect the camo regions, but I encountered the following error when I ran 05_CREATE_BED_FILE (extract_camo_regions.py):
Wed Jul 26 20:08:19 CST 2023 python extract_camo_regions.py
Traceback (most recent call last):
File "extract_camo_regions.py", line 113, in <module>
main(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5])
File "extract_camo_regions.py", line 94, in main
group_pos = [regions[region_id]]
KeyError: 'DDX11L1_1::chr1:11869-12227'
Actually, I have no idea about this, and I don't know if there's something wrong with the format that KeyError points out? Looking forward to your reply!
Hi @LiShuhang-gif ,
A couple of questions, What is the reference genome(s) you are working with? Is it human data?
It is possible that the reference you are using and the format your data is in are incompatible. This could be the case if, for example, your data is formatted like 1:11869-12227
and the reference genome is formatted with chr1:11869-12227
. In order for the pipeline to work, there are certain files whose format must match.
We also have a good number of the .bed
files already created. You can see if they match your data. They are located on the nextflow-pipeline
branch (https://github.com/mebbert/Dark_and_Camouflaged_genes/tree/nextflow-pipeline/camo_bed_files)
This pipeline is not under active development or support, but we will try to help as much as we can.
Thank you! Maddy
Hi, actually I'm using hg38 human genome as reference. I will try your suggestion to check my current format and if there is any progress I will leave a message here. Thanks!
Hello, I have another question now. Do the dark and canmouflage regions vary greatly between populations? Can I merge the bed file I got with the bed file you provided to get a more complete set of dark regions? Or, to be more specific, can I use the bed file of the dark region obtained from one population to screen for SNPS in another population? Thanks!
Hello @LiShuhang-gif,
That's a great question.
Since the camouflaged regions are mostly determined by the reference (and not the population), it shouldn't make a big difference, but we haven't systematically assessed this by population. There may be more variation in the dark-by-depth regions (e.g., if for some reason certain populations really don't have that gene/region present in their genome), but camouflaged regions occur when there are duplications in the reference genome (regardless of whether there are duplications in the individual's/population's genome).
This is why we say this method is really a band-aid solution for genomics. What we really need is to construct each individual's genome structure rather than imposing a single genome's (or even a pangenome's) structure on the individual(s).
I hope that helps. Conceptually, the idea is simple, but it gets a bit complicated as you get into the weeds.