CAFE icon indicating copy to clipboard operation
CAFE copied to clipboard

ValueError: invalid literal for int() with base 10: '5_42'

Open shiyi-pan opened this issue 3 years ago • 8 comments

Hi, I want to use cafe to do gene family study but when I run cafetutorial_report_analysis.py script , I met an error .

Here is my script :

$PYTHON cafetutorial_report_analysis.py -i report_run2.cafe -o summary20220113

Here is my error: [*-------------------------------------------------] 0.033% complete.Traceback (most recent call last): File "cafetutorial_report_analysis.py", line 371, in results_main, node_fams_main = cra(inlines_main, results_main, node_fams_main, linestart_main, ancfilename, sorted_nodes, 1); File "cafetutorial_report_analysis.py", line 193, in cra anccount = int(tlinfo[curanc][4]); ValueError: invalid literal for int() with base 10: '5_42'

Here is my input file: report_run2.zip

Could you help me fix it? Thank you very much. By the way, is cafe has some article to cite?

shiyi-pan avatar Jan 13 '22 07:01 shiyi-pan

I meet the same error. Do you have solution?

yuanw-18 avatar Mar 01 '22 14:03 yuanw-18

I'm sorry I can't help you. I don't solve this problem now.

shiyi-pan avatar Mar 03 '22 01:03 shiyi-pan

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except
time. Good luck~

yuanw-18 avatar Mar 03 '22 02:03 yuanw-18

Hi all, Sorry I missed this post. This is likely because your input tree has internal node labels, which CAFE can handle but the report analysis script cannot. This is a common issue and has been reported before (https://groups.google.com/g/hahnlabcafe/c/NboOVhcbZPk/m/JluqEyZVBAAJ). I'll copy the solutions I posted to that thread here too:

Until the report analysis script is fixed, there are two solutions:

  1. Using your favorite text editor or bash regex program, do a simple find and replace of all the node labels in the report file to replace them with empty strings. This might be difficult in your case because you have so many unique node labels, and this also runs the risk of accidentally replacing other important text within the report file.
  2. Remove the labels from your CAFE input tree and re-run CAFE to generate a report file without the node labels. I think this will be the easiest option.

gwct avatar Mar 03 '22 16:03 gwct

Thank you

I have solved my problem. Because my tree file has extra "." . I checked your input file. Maybe you can replace the species name with no dots, and remove the other information except time. Good luck~

Thank you for your reply. I will replace the species name with no dots , such as replace "Cicerarietinum.representative.pep" with "Cicerarietinum_pep" . But i don't know how to "remove the other information except time" , could you give me a example ? Thank you very much.

shiyi-pan avatar Mar 04 '22 01:03 shiyi-pan

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

yuanw-18 avatar Mar 04 '22 01:03 yuanw-18

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678)

This is an example.

我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

非常感谢您的回复,以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh,请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下:

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein (null) OG0000014 1 0 21 4 3 1 97 94 42 21 (null) OG0000018 1 6 52 60 12 15 20 42 29 4 (null) OG0000020 0 3 42 96 14 0 13 21 34 9 (null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05 tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000) ; lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1); report ./reports/report_run1

shiyi-pan avatar Mar 04 '22 01:03 shiyi-pan

tree (((((acch:74.8509,(((rhde:7.0997,rhwi:7.0997):3.6345,rhsi:10.7342):00.8280,rhov:11.5623):63.2887):8.2057,casi:83.0566):0.056636,dilo:88.7201):18.2580,prvu:106.9782):4.1897,soly:111.1678) This is an example. 我再用中文解释一下,去掉所有空格和bootstrap的信息,仅保留时间信息。看上面@gwct的评论也是这个意思,计算扩张收缩的基因家族数量没有问题,但是如果要用report这个汇总的脚本就需要对把树上多余的标签信息去掉,

非常感谢您的回复,以下是我改好的输入文件 filtered.cafe.input.tsv 和 cafetutorial_run1.sh,请帮我看一下是否还存在格式上的问题。

filtered.cafe.input.tsv如下:

Desc Orthogroup ArabidopsisthalianaAraport11representativepep Cicerarietinumrepresentativepep Gmax275Wm82longestprotein Medicagotruncatularepresentativepep NN1138longestpep Nelumbonucifera11longestpep SoyL01longestProtein SoyW01longestProtein ZH13longestprotein araduV14167gnm1ann1cxSMprotein (null) OG0000014 1 0 21 4 3 1 97 94 42 21 (null) OG0000018 1 6 52 60 12 15 20 42 29 4 (null) OG0000020 0 3 42 96 14 0 13 21 34 9 (null) OG0000021 10 5 33 4 22 52 28 31 30 9

cafetutorial_run1.sh如下:

#!/gss1/home/hjb20181119/panyongpeng/NN1138-2/03.orthofinder_data/04.between_species_cafe/CAFE-4.2/CAFE/release/cafe

load -i filtered.cafe.input.tsv -t 4 -l ./reports/log_run1.txt -p 0.05 tree (((((Cicerarietinumrepresentativepep:21.121085,Medicagotruncatularepresentativepep:21.121085):19.000163,((SoyL01longestProtein:2.259435,SoyW01longestPro tein:2.259435):0.301212,((NN1138longestpep:2.374908,Gmax275Wm82longestprotein:2.374908):0.097247,ZH13longestprotein:2.472155):0.088492):37.560601):8.591100,a raduV14167gnm1ann1cxSMprotein:48.712348):43.714107,arabidopsisthalianaaraport11representativepep:92.426456):29.573544,Nelumbonucifera11longestpep:122.000000) ; lambda -s -t (((((1,1)1,((1,1)1,((1,1)1,1)1)1)1,1)1,1)1,1); report ./reports/report_run1

脚本cafetutorial_report_analysis.py正常运行了,非常感谢。

shiyi-pan avatar Mar 04 '22 01:03 shiyi-pan