Augustus
Augustus copied to clipboard
Augstus CGP stays empty if sequence file contains "-"
I am trying to run Augustus in CGP mode. One "species" contains a "-" in it's name. It looks like Augustus is not parsing this right when reading "speciesfilename". I get something like this when using test data:
Warning: Species gal is not included in the target list of species. These alignment lines are ingored.
Of note, I edited galGal4 to gal-Gal4.
I think this is where things are going wrong:
map<string,string> getFileNames (string listfile){
map<string,string> filenames;
ifstream ifstrm(listfile.c_str());
if (ifstrm.is_open()){
char buf[256];
while(ifstrm.getline(buf,255)){
stringstream stm(buf);
string species, dir;
if(stm >> species >> dir){
dir = expandHome(dir);
filenames[species] = dir;
}
else
throw ProjectError(listfile + " has wrong format in line " + buf + ". correct format:\n\n" +
"Homo sapiens <TAB> /dir/to/genome/human.fa\n" +
"Mus musculus <TAB> /dir/to/genome/mouse.fa\n" +
"...\n");
}
ifstrm.close();
}
Above code is actually fine, the issue appears somewhere else.
Guess it is genomicMSA.cc
if ((completeName[i] == '-') || (completeName[i] == '.'))
Changing it to:
if ((completeName[i] == '.'))
Test running.
@MarioStanke Remove '-' fixes this but I am not sure if '-' is actually there for a good reason.
I suspect that we have had alignments in MAF format as input that used a "-" to separate the species name from the sequence name. I don't remember any such examples recently, though. I will make it a (low-priority) issue to allow a minus in the species name. Thanks