metAMOS icon indicating copy to clipboard operation
metAMOS copied to clipboard

Functional Annotation: Blast Output vs Krona Output

Open Kyxsune opened this issue 9 years ago • 2 comments

Hey Guys,

Thanks for helping me out with my previous issue, the current one comes after running the pipeline successfully (awesome btw). I use the functional annotation with blast, and get a lot of annotated scaffolds, but only a fraction of those are reported in the krona output. I was wondering why this is the case and if there was something I was doing wrong (as usual)

For comparison the entire krona.ec.input file: D1BS28 6.3.2.n2 0.0 Q59800 1.2.1.12 2e-132 Q5YTD8 4.-.-.- 1e-154 C5C5T3 4.2.3.5 3e-123 C5C687 6.3.5.5 0.0 C5C692 2.7.7.6 5e-34 B8H6T9 2.5.1.9 2e-34 Q6AF75 3.6.1.31 6e-26 A1RAW1 3.5.4.13 1e-91 P0CH00 1.17.4.1 0.0 Q9CBQ2 1.17.4.1 3e-166 A0R7G6 5.5.1.4 3e-166 C5C0K2 2.7.7.6 0.0 C5C0G1 2.7.7.6 5e-161 C5C0S6 1.6.99.5 2e-80 Q9X8I9 3.6.1.1 1e-62 C5C046 6.1.1.10 0.0 A6WDJ3 3.6.5.n1 0.0 C5C3D1 4.2.1.33 6e-100 C5C2I9 1.1.1.85 8e-139 C5C2I2 1.1.1.86 5e-27 C5C2I2 1.1.1.86 1e-94 A0JXZ9 4.2.1.9 0.0 C5BWR2 2.7.7.8 0.0 Q0SEI1 2.1.1.45 1e-129 C5C1U6 3.6.3.14 0.0 A6W7G9 3.6.3.14 0.0 A1SNY6 4.6.1.12 1e-07 C5C093 4.2.1.11 0.0 C5C1C4 2.7.1.30 0.0 A0JUS8 1.17.7.1 0.0 B2GKD7 6.3.4.5 0.0 A1R559 4.2.1.19 2e-75 P03631 3.1.21.-;6.5.1.1 2e-160

And a mere portion of the blast.out file: scaffold_2_106306_106737_+ B7GQS6 74.02 127 33 0 16 142 10 136 4e-51 199 scaffold_2_136137_137282_+ O07147 55.28 360 154 4 1 358 18 372 6e-86 317 scaffold_2_138042_139004_+ Q5YTD8 90.72 291 27 0 29 319 16 306 1e-154 545 scaffold_2_139698_140882_+ O58489 34.76 374 238 4 1 370 17 388 5e-58 225 scaffold_2_140884_141510_+ A1SJA2 66.01 203 65 1 3 205 1 199 9e-71 266 scaffold_2_141595_142350_+ C5C5Q3 84.06 251 39 1 1 250 2 252 7e-113 406 scaffold_2_142357_143103_+ A4FBA6 68.39 155 49 0 59 213 1 155 2e-47 189 scaffold_2_143224_143832_+ C5C5Q7 70.62 177 47 1 1 177 6 177 8e-49 193 scaffold_2_143846_144850_+ Q47N43 76.85 311 70 1 23 333 32 340 1e-118 426 scaffold_2_144946_145359_+ P65026 43.24 74 39 1 4 77 10 80 4e-07 53.5 scaffold_2_145447_147390_+ Q53955 50.94 373 172 5 190 557 198 564 1e-74 281 scaffold_2_145447_147390_+ Q53955 41.76 91 50 2 39 126 8 98 7e-12 72.8 scaffold_2_147468_148493_+ Q53956 48.64 294 147 3 1 293 27 317 5e-61 234 scaffold_2_148514_149053_+ A5CSP7 58.60 157 64 1 21 177 20 175 8e-31 132 scaffold_2_149197_151521_+ P52560 73.44 768 186 4 16 765 77 844 0.0 1159 scaffold_2_154038_154814_- P46697 44.85 136 58 5 97 221 129 258 3e-22 105 scaffold_2_154966_155667_+ Q49649 35.06 231 130 5 3 232 6 217 2e-20 99.0 scaffold_2_155673_157043_+ Q6A8J7 60.22 455 172 4 1 451 2 451 6e-141 501 scaffold_2_157094_157618_- P30536 47.54 122 62 1 50 169 33 154 5e-17 87.0 scaffold_2_157724_158791_+ P65024 66.56 299 96 1 1 295 2 300 2e-88 325 scaffold_2_158800_160830_+ C5C5S1 78.92 593 115 4 66 648 1 593 0.0 867 scaffold_2_164023_164610_+ Q5NZT6 42.13 197 99 3 1 184 29 223 4e-28 124 scaffold_2_165763_168012_- O88022 49.33 223 113 0 524 746 466 688 1e-44 182 scaffold_2_165763_168012_- O88022 38.21 301 156 5 44 344 35 305 3e-21 103 scaffold_2_168117_169550_+ Q50739 67.63 207 67 0 260 466 241 447 3e-70 266 scaffold_2_168117_169550_+ Q50739 62.42 165 51 3 2 164 9 164 4e-41 169 scaffold_2_169757_170383_+ C5C5S5 78.26 207 45 0 1 207 2 208 5e-88 323 scaffold_2_171490_174186_+ A6WCF7 64.13 895 312 3 1 895 2 887 0.0 938 scaffold_2_174255_174818_+ Q9KXQ0 54.84 155 63 2 2 156 16 163 3e-25 114 scaffold_2_174953_175978_+ O34758 27.98 336 204 9 14 325 25 346 9e-25 114 scaffold_2_176002_176868_+ Q5SJF8 49.09 110 51 3 1 110 7 111 2e-16 86.3 scaffold_2_176986_178659_+ P45792 44.38 525 285 4 36 554 42 565 3e-122 439 scaffold_2_178964_180139_+ P24559 53.73 335 153 2 20 354 12 344 1e-97 356 scaffold_2_180190_181365_+ P22609 36.77 378 231 2 13 389 3 373 4e-65 248 scaffold_2_184930_185724_+ P45794 36.73 275 144 4 9 259 18 286 6e-23 107 scaffold_2_185809_186873_+ P57309 22.40 183 118 2 142 300 160 342 9e-06 51.6 scaffold_2_188478_189668_+ C5C5T3 80.51 395 76 1 1 395 2 395 3e-123 441 scaffold_2_189761_190237_+ Q47QY8 51.13 133 60 1 2 134 34 161 2e-20 98.2 scaffold_2_190288_191364_+ B8H8V5 68.12 345 106 1 1 345 22 362 3e-124 445 scaffold_2_192870_193520_+ A0JX80 79.68 187 38 0 29 215 1 187 2e-86 318 scaffold_2_193522_193938_+ A0JX79 62.04 137 50 1 1 137 2 136 6e-34 142 scaffold_2_194135_194728_+ Q9KXR1 66.28 172 55 1 20 191 14 182 5e-46 183 scaffold_2_194776_195792_+ C5C683 77.93 299 58 3 1 298 19 310 6e-116 417 scaffold_2_195846_197141_+ Q9KXR3 75.39 386 95 0 27 412 40 425 7e-166 583 scaffold_2_197782_198906_+ Q9KXR5 72.29 350 78 3 1 350 45 375 5e-142 504 scaffold_2_199011_202229_+ C5C687 80.53 1063 199 2 1 1058 37 1096 0.0 1605 scaffold_2_199011_202229_+ C5C687 33.66 407 240 11 521 914 9 398 2e-38 162 scaffold_2_202259_203083_+ Q9KXR8 68.21 151 48 0 2 152 14 164 2e-46 186 scaffold_2_203719_204294_+ Q47R17 65.93 182 61 1 6 187 14 194 6e-56 216 scaffold_2_204330_204596_+ C5C692 86.75 83 11 0 1 83 2 84 5e-34 142 scaffold_2_204613_205827_+ P67733 59.61 406 157 5 1 401 10 413 2e-71 269 scaffold_2_205857_207062_+ B1W470 74.23 388 96 2 5 389 4 390 9e-160 563 scaffold_2_207068_209266_+ Q9CCQ3 50.00 428 198 5 315 730 227 650 1e-46 188 scaffold_2_207068_209266_+ Q9CCQ3 59.62 104 42 0 53 156 2 105 1e-19 98.6 scaffold_2_209284_209883_+ Q8K370 32.65 98 59 3 82 176 136 229 1e-05 49.7 scaffold_2_209895_210836_+ Q827P7 65.70 309 104 2 1 308 2 309 4e-106 384 scaffold_2_210938_212320_+ P71675 49.89 459 205 8 1 459 24 457 2e-53 210 scaffold_2_212385_213074_+ Q9L0Z5 63.89 216 78 0 9 224 3 218 4e-64 244 scaffold_2_213340_214440_+ P71677 52.09 311 143 3 47 356 29 334 4e-53 208 scaffold_2_214455_215102_+ P65327 63.07 176 60 3 27 198 23 197 3e-37 154 scaffold_2_215129_216403_+ A8LY38 57.31 424 160 4 1 422 4 408 1e-122 439 scaffold_2_216487_216870_+ B8H6T9 84.09 88 12 1 22 109 58 143 2e-34 144 scaffold_2_217653_217916_+ Q6AF75 82.56 86 15 0 1 86 2 87 6e-26 115 scaffold_2_217954_218802_+ B8H6U1 68.31 284 86 2 1 281 2 284 1e-95 349 scaffold_2_218812_219309_+ P64848 35.90 78 45 1 64 141 59 131 6e-04 43.5 scaffold_2_220700_221743_+ P54076 44.60 213 113 1 8 215 7 219 1e-34 147 scaffold_2_221778_222716_+ P54075 35.69 283 165 8 1 274 15 289 2e-27 122 scaffold_2_223173_223502_+ C0ZZU1 59.46 37 15 0 21 57 20 56 2e-06 51.6 scaffold_2_223510_224325_+ P66895 33.57 283 165 6 3 269 15 290 5e-30 131 scaffold_2_224343_225248_+ O53526 36.60 235 132 6 67 297 85 306 2e-26 119 scaffold_2_225258_228137_+ Q10701 47.16 634 320 5 314 940 263 888 8e-113 409 scaffold_2_225258_228137_+ Q10701 69.20 224 66 2 42 264 14 235 5e-81 303 scaffold_2_230205_231857_+ Q8D124 27.52 505 320 17 36 517 10 491 1e-20 102 scaffold_2_231872_232636_+ O14466 37.56 221 124 6 2 216 6 218 3e-33 142 scaffold_2_233228_234325_- Q82PX1 51.23 367 166 5 1 362 2 360 3e-84 312 scaffold_2_234475_235035_+ P54570 25.71 175 121 4 4 176 7 174 2e-06 52.4 scaffold_2_236621_236980_+ P28267 43.14 51 29 0 48 98 48 98 3e-05 47.0 scaffold_2_238270_239031_+ O66489 37.25 255 156 3 1 252 3 256 6e-38 157 scaffold_2_240898_241617_- A3PSZ4 42.31 78 44 1 12 89 59 135 7e-05 47.8 scaffold_2_241915_242858_- O31020 23.68 321 213 10 3 309 3 305 2e-15 83.6 scaffold_2_243704_244156_+ P67748 44.09 127 71 0 6 132 2 128 2e-21 100 scaffold_2_244182_245135_+ P77735 32.22 329 168 12 6 309 10 308 1e-26 120 scaffold_2_245174_245539_+ O06008 42.34 111 62 1 7 115 4 114 1e-15 81.6 scaffold_2_259771_260640_- P64982 29.76 168 98 7 51 203 68 230 1e-04 47.8 scaffold_2_261187_261978_- P69167 49.38 243 112 4 3 245 4 235 4e-48 191 scaffold_2_262092_262639_+ P39897 40.98 61 36 0 23 83 23 83 7e-06 50.1 scaffold_2_263729_264412_+ P25150 27.59 87 59 1 49 131 78 164 1e-04 47.0 scaffold_2_264521_265990_+ P39886 30.07 439 277 5 23 448 25 446 2e-20 100 scaffold_2_265995_266948_+ P96662 37.63 287 176 2 28 313 3 287 1e-55 216 scaffold_2_267305_268558_+ P55183 25.48 208 108 2 200 386 199 380 4e-08 59.7 scaffold_2_268573_269214_+ P55184 38.86 211 123 3 1 211 12 216 2e-22 105 scaffold_2_271028_271573_- Q11063 30.64 173 83 6 16 178 19 164 4e-04 44.3 scaffold_2_271626_272462_+ O05730 34.83 201 131 0 1 201 8 208 5e-25 115 scaffold_2_273123_274223_+ Q8MZR6 29.14 405 237 15 1 361 43 441 7e-30 131 scaffold_2_274333_275829_+ P0A0J9 28.48 302 216 0 38 339 20 321 2e-08 61.2 scaffold_2_280147_281052_+ O52866 28.62 290 182 8 19 295 5 282 3e-13 75.9 scaffold_2_282402_283583_- O53656 54.24 59 27 0 315 373 291 349 2e-09 63.5 scaffold_2_284145_285140_+ B6EH86 37.25 298 166 7 5 289 3 292 5e-50 198 scaffold_2_285266_286543_+ Q0H904 33.33 168 99 4 93 252 75 237 7e-19 95.5 scaffold_2_287303_288142_+ Q60283 27.36 201 136 4 34 225 10 209 1e-13 77.4 scaffold_2_288399_289117_- P67672 74.42 129 33 0 34 162 64 192 2e-52 205 scaffold_2_289259_289822_- Q50604 43.86 114 63 1 54 167 50 162 2e-19 95.1

scaffold_3_10979_11410_+ P37424 37.70 122 70 3 1 119 13 131 1e-10 65.1 scaffold_3_11468_13114_- P54744 45.27 243 119 7 35 271 7 241 4e-40 166 scaffold_3_14611_15516_- Q8A1M1 43.50 200 113 0 54 253 2 201 8e-44 177 scaffold_3_18605_19567_- Q58619 28.07 171 108 4 79 242 15 177 5e-10 65.5 scaffold_3_20645_22375_+ P38569 60.33 552 207 3 20 571 4 543 3e-173 608 scaffold_3_22732_23409_+ Q88A30 30.87 230 138 5 6 219 13 237 1e-12 73.2 scaffold_3_29859_31631_+ P07003 47.06 561 285 4 22 582 21 569 2e-127 456 scaffold_3_31675_32421_- P39367 55.33 244 109 0 4 247 5 248 1e-75 283 scaffold_3_32639_34021_- Q46892 38.61 417 239 4 52 458 44 453 3e-47 189 scaffold_3_34071_34652_- P46859 45.95 148 78 2 31 176 10 157 3e-33 140 scaffold_3_34723_35436_+ P31460 33.65 211 124 6 12 215 12 213 3e-10 65.5 scaffold_3_37737_38447_+ P41780 33.04 115 67 3 130 234 225 339 4e-09 62.0 scaffold_3_38942_40348_+ Q797A7 26.74 344 224 7 10 335 18 351 2e-16 87.4 scaffold_3_43031_43843_+ Q04605 33.33 102 68 0 122 223 6 107 8e-13 74.3 scaffold_3_43922_44386_+ P94562 42.57 148 84 1 1 148 3 149 8e-16 82.8 scaffold_3_44966_46180_- P10482 22.01 368 247 10 21 351 54 418 1e-11 71.2 scaffold_3_46228_47028_- Q9KEE9 37.17 191 117 2 24 211 35 225 4e-29 128 scaffold_3_47106_48086_- O32155 35.58 267 168 3 62 324 26 292 3e-28 126 scaffold_3_48184_49443_- Q8FVS7 26.33 376 245 15 41 400 35 394 5e-16 85.9 scaffold_3_49592_50578_- P24242 29.76 289 188 6 1 281 12 293 2e-19 96.7 scaffold_3_50712_51731_+ P82594 57.14 287 111 3 4 288 52 328 1e-84 313 scaffold_3_51783_57656_- Q8NXX6 30.94 265 142 6 1537 1801 793 1016 1e-19 100 scaffold_3_51783_57656_- Q8NXX6 30.27 294 155 8 1527 1820 560 803 5e-19 98.6 scaffold_3_51783_57656_- Q8NXX6 27.68 271 137 8 1536 1797 902 1122 1e-12 77.0 scaffold_3_51783_57656_- Q8NXX6 28.85 260 143 8 1537 1795 682 900 2e-12 76.6 scaffold_3_61532_62455_- Q9KP71 32.84 204 129 3 85 286 7 204 3e-12 72.4 scaffold_3_62861_63664_- Q8XH28 33.82 68 45 0 54 121 54 121 2e-06 53.1 scaffold_3_66079_66972_- P77716 29.25 253 177 1 46 296 27 279 7e-15 81.3 scaffold_3_66996_67940_- O32155 35.51 245 151 4 46 288 30 269 2e-36 152 scaffold_3_68011_69276_- O32156 21.91 388 253 14 46 411 52 411 3e-07 56.6 scaffold_3_69465_70466_+ Q65TP0 29.48 329 226 4 2 326 3 329 1e-34 147 scaffold_3_70676_71749_+ Q9X1E2 42.75 138 58 1 200 316 372 509 1e-19 97.8 scaffold_3_73807_75186_- P05656 47.37 323 151 9 25 335 35 350 2e-78 293 scaffold_3_75188_76174_- O31520 31.80 239 143 4 94 326 72 296 6e-29 128 scaffold_3_76176_77123_- O32155 32.67 251 164 4 68 314 42 291 5e-21 101 scaffold_3_78663_79688_- Q87QW9 30.77 338 220 7 1 334 3 330 1e-26 120 scaffold_3_81051_81473_+ Q52996 32.89 76 47 1 22 97 26 97 1e-04 45.4 scaffold_3_81584_82759_+ Q52997 44.60 361 176 7 2 356 37 379 1e-40 167 scaffold_3_83962_84900_- P65050 33.33 306 126 9 25 309 17 265 5e-24 112 scaffold_3_84972_86210_- Q9P6J2 44.34 106 57 2 145 250 16 119 7e-05 48.9 scaffold_3_86809_87819_- P96253 33.53 334 211 8 2 329 5 333 1e-32 140 scaffold_3_87972_88835_- P39315 45.32 278 151 1 2 279 2 278 2e-49 196 scaffold_3_88962_89378_+ P0ACN3 52.43 103 49 0 24 126 8 110 2e-26 117 scaffold_3_89500_90435_+ P42458 40.52 116 68 1 195 309 3 118 8e-16 84.7 scaffold_3_90553_92640_+ A7NR66 36.61 691 424 8 8 692 10 692 8e-99 362 scaffold_3_92683_93186_- P44558 36.89 122 76 1 31 152 2 122 1e-15 82.4 scaffold_3_93950_94648_+ P0AFR6 47.32 205 108 0 26 230 1 205 1e-53 209 scaffold_3_94713_95540_+ P64786 57.08 219 84 3 55 271 54 264 2e-62 239 scaffold_3_95645_96538_- P0AG84 57.93 290 120 2 7 295 5 293 2e-90 332 scaffold_3_98452_99375_+ C6A3T5 30.77 260 151 8 1 236 15 269 3e-11 69.7 scaffold_3_99510_100367_+ O13963 30.26 152 93 4 99 242 163 309 3e-06 52.4 scaffold_3_101655_102629_+ Q9KWF6 57.37 319 135 1 3 321 33 350 1e-75 283 scaffold_3_103945_105069_+ P54550 43.51 370 170 7 3 370 5 337 3e-76 285 scaffold_3_105580_107223_+ P64778 37.55 229 129 5 267 487 26 248 2e-14 80.9 scaffold_3_110392_111720_+ O53522 29.02 224 132 7 174 379 110 324 2e-05 50.8 scaffold_3_111785_112648_- P0AEQ3 31.86 204 134 4 70 272 44 243 2e-17 90.1 scaffold_3_112705_113403_- P0AE36 33.87 186 100 2 30 192 42 227 3e-25 115 scaffold_3_113451_114248_- P54537 48.78 246 118 2 19 263 1 239 7e-64 243 etc

Kyxsune avatar Jul 16 '15 16:07 Kyxsune

For the functional annotation step the following commands were run:

|2015-07-14 16:14:22|# [FUNCTIONALANNOTATION] |2015-07-14 16:26:43| [Path to Directory]/metAMOS-1.5rc3/Utilities/cpp/Linux-x86_64/blastall -p blastp -i [Path to Directory]/Desktop/GenomeTK2/Mix2/FindORFS/out/proba.faa -d [Path to Directory]/Desktop/metAMOS-1.5rc3/Utilities/DB//uniprot_sprot.fasta -a 39 -e 0.001 -m 8 -b 1 > [Path to Directory]/Desktop/GenomeTK2/Mix2/FunctionalAnnotation/out/blast.out |2015-07-14 16:26:44|[Path to Directory]/Desktop/metAMOS-1.5rc3/KronaTools/bin/ktImportEC [Path to Directory]/Desktop/GenomeTK2/Mix2/FunctionalAnnotation/out/krona.ec.input

Since these two commands ran without incidence I can only imagine that the bottleneck would be in the krona.ec.input file. I do not know perl, but i get the feeling the ktImportEC command only pulls the annotations for the ones with a parsed ec score?

Kyxsune avatar Jul 16 '15 16:07 Kyxsune

Found it (I think).

The bottleneck is inside the fannotate.py file.(https://github.com/marbl/metAMOS/blob/v1.5rc3/src/fannotate.py) When parsing the data from the blast.out file it only writes those that have a length over 50 , and a percent id over 80. I actually dont know why the cutoff was set at that level, but at least we know where it is now. If you could explain why, I would be grateful I am quite new to bioinformatics after all ^^

Kyxsune avatar Jul 16 '15 18:07 Kyxsune