hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

ffindex_order giving many fewer entries in output .ff(data,index} files than in input files

Open hrp1000 opened this issue 1 year ago • 4 comments

I'm trying to create a custom database, but find that I am getting many fewer entries in _a3m.ff{data,index} and _hhm.ffindex after running ffindex_order than are present in the input files.

I don't have the mpi versions installed.

My normal DB will have ~70K entries, but this behaviour can be seen with 100 entries in the initial _???.ff{data,index} files -

Running hh-suite/3.3.0 on Intel hardware, CentOS Linux release 7.9.2009 (Core):

cstranslate -f -x 0.3 -c 4 -I a3m -i full_a3m -o full_cs219 sort -k3 -n full_cs219.ffindex | cut -f1 > sorting.dat cat sorting.dat | sed 's/.a3m$/.hhm/' > sorting.hhm # need this step because ffindex_order does not run for .hhm files with the extensions in the original sorting.dat file ffindex_order sorting.hhm full_hhm.ff{data,index} full_hhm_ordered.ff{data,index} ffindex_order sorting.dat full_a3m.ff{data,index} full_a3m_ordered.ff{data,index}

wc -l full*index 100 full_a3m.ffindex 6 full_a3m_ordered.ffindex 100 full_cs219.ffindex 100 full_hhm.ffindex 6 full_hhm_ordered.ffindex 312 total

My assumption is that I've screwed up somewhere - an indication of where would be most useful!

hrp1000 avatar Mar 24 '23 16:03 hrp1000