FindMyFriends icon indicating copy to clipboard operation
FindMyFriends copied to clipboard

Add export API

Open thomasp85 opened this issue 9 years ago • 7 comments

Possible formats:

  • sqlite
  • xlsx
  • flatfile

thomasp85 avatar Oct 30 '15 16:10 thomasp85

About .fasta file, if I have some bacterial genomes in .seq, .fasta(only have sequence information), .gbk format, how could I transfer these files into .fasta in extdata (like >gi|71851486|gb|AE017243.1|_1 # 207 # 395 # 1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=AGGA/GGAG/GAGG;rbs_spacer=11-12bp;gc_cont=0.275 MQTNKNNLKVRTQQIRQQIENLLNDRMLYNNFFSTIYVLNETETEIIIDFTDLIAKQEVISR* ) to use FindMyFriends?

Zbrel avatar Apr 15 '16 09:04 Zbrel

I'm not quite sure if I understand your question. Do you have sequence information in multiple different formats? Generally you should try to have your sequences annotated by the same algorithm to avoid differences in gene detection bias...

thomasp85 avatar Apr 15 '16 11:04 thomasp85

I get it.Should use prodigal to predict protein-coding gene for prokaryotic genomes first.

Zbrel avatar Apr 15 '16 12:04 Zbrel

Yep - or glimmer, or something else... Currently only automatic location detection is supported for prodigal created files, but there is a fork where I'm working on a gff parser that should be more broadly applicable...

thomasp85 avatar Apr 15 '16 12:04 thomasp85

There are 260779 genes from 39 organisms, how long would consume to
run "mycoSim <- kmerSimilarity(mycoPan, lowerLimit=0.8, rescale=FALSE)" ?

Zbrel avatar Apr 16 '16 08:04 Zbrel

A lot of things factor in. First of, I don't know your computer hardware. The second thing is that kmerSimilarity is absolutely the least advised approach to calculating pangenomes in FindMyFriends as it is the most computationally heavy. If you have installed the development version (which I'll advice as it contains numerous improvements) then use the cdhitGrouping function followed by neighborhoodSplit. This way I've successfully calculated pangenomes from thousands of genomes within a day...

That some big genomes you're working with btw... ~6.500 genes

thomasp85 avatar Apr 16 '16 09:04 thomasp85

As all this doesn't concern the issue of adding an export API I would prefer if you opened a new issue for further questions (which you are welcome to do - just trying to keep issues separated)

thomasp85 avatar Apr 16 '16 10:04 thomasp85