libgaba icon indicating copy to clipboard operation
libgaba copied to clipboard

Scores are possibly off

Open giuliaguidi opened this issue 5 years ago • 0 comments

Hi,

I ran libgaba using a linear gap penalty and regardless of the sequences similarity and scoring matrix, the scores are way off with respect to the ground truth.

Could you please let me know if I'm doing something wrong?

Here's the program I'm using:

int gabaAlign(int mat, int mis, int gap, int k, int xdrop, 
    std::string& targetSeg, std::string& querySeg)
{
gaba_t *ctx = gaba_init(GABA_PARAMS(
		// match award, mismatch penalty, gap open penalty (G_i), and gap extension penalty (G_e)
		GABA_SCORE_SIMPLE(mat, -mis, 0, -gap),
		gfa : 0,
		gfb : 0,
		xdrop : (int8_t)xdrop,
		filter_thresh : 0,
	));

	char const t[64] = {0};	// tail array
	gaba_section_t asec = gaba_build_section(0, (uint8_t const *)targetSeg.c_str(), (uint32_t)targetSeg.size());
	gaba_section_t bsec = gaba_build_section(2, (uint8_t const *)querySeg.c_str(),  (uint32_t)querySeg.size());
	gaba_section_t tail = gaba_build_section(4, t, 64);

	// create thread-local object
	gaba_dp_t *dp = gaba_dp_init(ctx);	// dp[0] holds a 64-cell-wide context

	// init section pointers
	gaba_section_t const *ap = &asec, *bp = &bsec;
	gaba_fill_t const *f = gaba_dp_fill_root(dp, // dp -> &dp[_dp_ctx_index(band_width)] makes the band width selectable
		ap, 0,					// a-side (reference side) sequence and start position
		bp, 0,					// b-side (query)
		UINT32_MAX				// max extension length
	);

	// until x-drop condition is detected
	gaba_fill_t const *m = f;

	while((f->status & GABA_TERM) == 0) {
		// substitute the pointer by the tail section's if it reached the end
		if(f->status & GABA_UPDATE_A) { ap = &tail; }
		if(f->status & GABA_UPDATE_B) { bp = &tail; }

		f = gaba_dp_fill(dp, f, ap, bp, UINT32_MAX);	// extend the banded matrix
		m = f->max > m->max ? f : m;					// swap if maximum score was updated
	}

	// alignment path
	gaba_alignment_t *r = gaba_dp_trace(dp,
		m,		// section with the max
		NULL	// custom allocator: see struct gaba_alloc_s in gaba.h
	);

    int score = r->score;
    
	// clean up
	gaba_dp_res_free(dp, r); gaba_dp_clean(dp);
	gaba_clean(ctx);    

    return score;
}

For example, using seq1.seq and seq2.seq as input sequences, x = 50, match = 1, mismatch = -1, and gap = -1, the score should be 3105, while libgaba return 4197.

You can find the benchmark code here: https://github.com/giuliaguidi/xavier/tree/master/benchmark. It can be compiled simply via:

make

and run as:

./test-score-exe <x-drop> <seq1-file> <seq2-file>

The ground truth can be generated using PyAligner and running it in the https://github.com/giuliaguidi/xavier/tree/master/benchmark folder as:

python3 run_pyaligner.py <seq1-file> <seq2-file> -x <x-drop>

Thank you very much!

giuliaguidi avatar Feb 24 '20 20:02 giuliaguidi