GToTree icon indicating copy to clipboard operation
GToTree copied to clipboard

Add check for input genome file not having line return after last input genome (renamed by Mike)

Open Calvin2077 opened this issue 1 year ago • 5 comments

Hello!

I recently discovered your GToTree and have found it super helpful for my master's project and your clear instillation and instructions have been a huge help in getting it too work on my laptop so thank you very much.

I did a practice run of my species and one was dropped due to the redundancy being greater than 10%. As I am needing to include all my species for my project if there a way to increase the threshold of redundancy when using amino acids fasta files?

Thanks

Calvin2077 avatar Apr 13 '23 17:04 Calvin2077

Hey there, @Calvin2077!

Thanks for the kind words :)

A genome shouldn’t be dropped due to the redundancy estimate, that’s just a notice. Are you sure it’s not in the final tree? If not, it may be getting dropped for not enough target genes being found, which we can adjust

AstrobioMike avatar Apr 13 '23 18:04 AstrobioMike

Hello AstrobioMike,

You're welcome, and thank you for getting back to me so fast it is much appreciated. And I checked my tree and I am indeed missing a species.

Moreover when I run the code "GToTree -f Untitled2.txt -o hope_new -H Archaea" it says it is only using 40 out of my 41 species despite my list (Untitled2.txt) containing all of them.

I don't know if it's related but the one that is missing is the last one on my list.

Calvin2077 avatar Apr 13 '23 18:04 Calvin2077

Hmm, strange. Any chance you’d be able to share the fasta files and the input Untitled.txt file with me at MikeLee<at>bmsis.org so I can take a look? I’ll delete them right after testing of course

AstrobioMike avatar Apr 13 '23 18:04 AstrobioMike

@Calvin2077 and i tracked down that the issue was the input file listing the paths to the genomes didn't have a line-return character at the end of the file, and the last one was being left off

i need to think about how to put in a check for this

AstrobioMike avatar Apr 13 '23 20:04 AstrobioMike

Note for myself

I currently runn a dos2unix/cmp check on each input file, e.g.:

https://github.com/AstrobioMike/GToTree/blob/39ce5f114391938612df1df49ee6cc759208bed9/bin/GToTree#L435

I can add the --add-eol argument so they will auto-add an end-of-line to end of file if it's not there. That will address this. (Add it to the cmp checks too, so it's still only run if needed)

AstrobioMike avatar Sep 30 '24 01:09 AstrobioMike