FindMyFriends icon indicating copy to clipboard operation
FindMyFriends copied to clipboard

Formatting protein files

Open TEKEDAR opened this issue 8 years ago • 4 comments

Hello Thomas,

Our research group is working on couple comparative genomics paper and planning to use your software extensively but I am having problem with my protein files for couple prokaryotic genomes because findmyfriends gives the warning message all the time. Your example protein files for Mycobacteria genomes and mine are quite different .Is there any way to format the protein files ? or if you do not mind could you give some suggestions where to download the protein files ?( such as from NCBI or etc ?). so that I can download the similar formatted protein files same as yours.

Any help greatly appreciated !

Warning message: In unzip(system.file("extdata", "Listeria.zip", package = "FindMyFriends"), : error 1 in extracting from zip file

TEKEDAR avatar Apr 16 '16 07:04 TEKEDAR

Thanks for looking into FindMyFriends

First thing: system.file("extdata", "Listeria.zip", package = "FindMyFriends") looks for a Listeria.zip file within the package directory so unless you've gone through a lot of trouble to place it there it will not find anything. Simply substitute the function call with the location of your zip file (or file.choose() to get a prompt where you can navigate to it).

The most supported input format is protein fasta files as exported by prodigal - the reason why this is convenient is that prodigal writes the chromosomal location of each gene in the fasta header, so it's directly readable by FindMyFriends... I think most genomes on ncbi are being run through prodigal automatically so you can probably fetch them from there, but it is generally a good custom when creating pangenomes to make sure that all genomes has been run through the exact same gene finding algorithm so it is usually done as preliminary step (it's quite fast - prodigal takes less than a minute for a standard sized genome). If you don't want to use prodigal (maybe you prefer Glimmer or something else) you can pass the location information in as a data.frame to the pangenome() constructor - see the documentation for that.

Lastly I'll advice you to install the development version from Bioconductor or the one available from GitHub as it includes a large number of improvements and features... It is substantially faster and more correct on edge cases...

thomasp85 avatar Apr 16 '16 09:04 thomasp85

Hi again,

Thanks for the quick response, I have tried all your suggestions but no luck ! Apparently somehow I have to format downloaded proteins files. Do you have any suggestions for this? Because, I have downloaded all the protein files from NCBI and ftp side but somehow protein files are not accepted by the FindMyFriends.

Thanks again.

TEKEDAR avatar Apr 18 '16 03:04 TEKEDAR

Can you provide the error message that you're getting? Also, multiple different protein files are available through NCBI - which one are you using? Have you tried running gene finding yourself with prodigal on the genome sequenced?

thomasp85 avatar Apr 18 '16 06:04 thomasp85

Hello Dr. Pedersen:

I am working with DNA-viruses genomes. Since your program "only sees" protein sequences in order to compare them by a similarity threshold and to get some absence/presence matrices, I wanted to take the risk to create viral pangenomes.

I retrieved protein sequences, but I have problems with the headers. For example, I have a protein header like this:

YP_009220652.1 | GeneID:26683547 | GI:973966925 | locus: AV955_gp002 | hypothetical protein

And I have noted that the format example used in the program package is like this:

gi|71851486|gb|AE017243.1|_1336 # 897262 # 897405 # -1 # ID=1_1336;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.417

Will I have some compatibility problems later? Since I am new in R and also new in your program, I am wondering if I have the possibility to change the format header or to continue working with my files.

Thank you very much and greetings from Mexico.

PepeCampillo avatar Mar 22 '17 01:03 PepeCampillo