Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

Augstus CGP stays empty if sequence file contains "-"

Open fbemm opened this issue 4 years ago • 4 comments

I am trying to run Augustus in CGP mode. One "species" contains a "-" in it's name. It looks like Augustus is not parsing this right when reading "speciesfilename". I get something like this when using test data:

Warning: Species gal is not included in the target list of species. These alignment lines are ingored.

Of note, I edited galGal4 to gal-Gal4.

I think this is where things are going wrong:

map<string,string> getFileNames (string listfile){
    map<string,string> filenames;
    ifstream ifstrm(listfile.c_str());
    if (ifstrm.is_open()){
	char buf[256];
	while(ifstrm.getline(buf,255)){
	    stringstream stm(buf);
	    string species, dir;
	    if(stm >> species >> dir){
		dir = expandHome(dir);
		filenames[species] = dir;
	    }
	    else
		throw ProjectError(listfile + " has wrong format in line " + buf + ". correct format:\n\n" + 
				   "Homo sapiens <TAB> /dir/to/genome/human.fa\n" + 
				   "Mus musculus <TAB> /dir/to/genome/mouse.fa\n" + 
				   "...\n");
	}
	ifstrm.close();
    }

fbemm avatar Apr 24 '20 08:04 fbemm

Above code is actually fine, the issue appears somewhere else.

fbemm avatar Apr 24 '20 11:04 fbemm

Guess it is genomicMSA.cc

if ((completeName[i] == '-') || (completeName[i] == '.'))

Changing it to:

if ((completeName[i] == '.'))

Test running.

fbemm avatar Apr 24 '20 11:04 fbemm

@MarioStanke Remove '-' fixes this but I am not sure if '-' is actually there for a good reason.

fbemm avatar Apr 24 '20 18:04 fbemm

I suspect that we have had alignments in MAF format as input that used a "-" to separate the species name from the sequence name. I don't remember any such examples recently, though. I will make it a (low-priority) issue to allow a minus in the species name. Thanks

MarioStanke avatar Apr 25 '20 11:04 MarioStanke