CAFE
CAFE copied to clipboard
cafetutorial_clade_and_size_filter.py error
Hi ! When I check cafetutorial_clade_and_size_filter.py at https://iu.app.box.com/v/cafetutorial-files/folder/22161186238?page=1 , I found the script will write out wrong gene families with only one gene copy among all species, which is not suitable for gene families analysing. The mistake stays between line of 104 and 105. maybe it just I could not fully understand it.
codes looks like:
elif line_n not in lines_to_separate_set and len(lines_to_keep_set) == 0:
output_file.write(line)
I'm not sure about that code snippet, but I think using single-copy groups (one gene in all species) in these analyses is necessary. These groups are still informative when estimating rates of gene gain and loss since they do tell us something about the amount of change over time (in this case likely no change). So for estimating lambda these are useful, and then for ancestral reconstructions its likely all ancestral states will be inferred as 1, so they can just be ignored (unless they are your family of interest). Likewise for groups that contain 2 copies in all species, or 3 copies in all species, etc. Does that make sense?
-Gregg
On Wed, Jul 17, 2019 at 2:35 AM zhang ning [email protected] wrote:
Hi ! When I check cafetutorial_clade_and_size_filter.py at https://iu.app.box.com/v/cafetutorial-files/folder/22161186238?page=1 , I found the script will write out wrong gene families with only one gene copy among all species, which is not suitable for gene families analysing. The mistake stays between line of 104 and 105. maybe it just I could not fully understand it. codes looks like: elif line_n not in lines_to_separate_set and len(lines_to_keep_set) == 0: output_file.write(line)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hahnlab/CAFE/issues/64?email_source=notifications&email_token=AC7RJCIFPWL3RLZT3HQNRVLP73KVTA5CNFSM4IEN33IKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7VJFTA, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7RJCKIAMHNWJWVQ57PQWLP73KVTANCNFSM4IEN33IA .
thanks gwct ! as gene families can be divided as 3 groups. 1: large_families with more than 100 gene copies are found in one or more species.2:filtered_families with more than 1 gene copies are found in more than 2 species in any clade or in all species. 3: gene families with less than 1 gene copy in all species. however, I think it is kind of weird at line of 104 and 105. then when I checked the results in the tutorial "large_filtered_cafe_input.txt" ,"filtered_cafe_input.txt " and "unfiltered_cafe_input.txt". the code may be wrong.....
I'm not sure about category 2. I think that should be families with 1 or more gene copies in more than one clade. But I'm not sure about it in the context of this script. I think someone who helped write this script will have to weigh in.
-Gregg
On Wed, Jul 17, 2019 at 9:36 AM zhang ning [email protected] wrote:
thanks gwct ! as gene families can be divided as 3 groups. 1: large_families with more than 100 gene copies are found in one or more species.2:filtered_families with more than 1 gene copies are found in more than 2 species in any clade or in all species. 3: gene families with less than 1 gene copy in all species. however, I think it is kind of weird at line of 104 and 105. then when I checked the results in the tutorial "large_filtered_cafe_input.txt" ,"filtered_cafe_input.txt " and "unfiltered_cafe_input.txt". the code may be wrong.....
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hahnlab/CAFE/issues/64?email_source=notifications&email_token=AC7RJCIF64QAR4MXMDN6DWTP744BRA5CNFSM4IEN33IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EZTYI#issuecomment-512334305, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7RJCOLW57LXX2ZNLKKH4TP744BRANCNFSM4IEN33IA .
@gwct very much thanks!