Duke icon indicating copy to clipboard operation
Duke copied to clipboard

ID order lost with LinkDatabaseMatchListener

Open spv3 opened this issue 10 years ago • 1 comments

Steps to reproduce Record link two tables(table 1 and table2) with sequential ids, and write results to a links table using LinkDatabaseMatchListener.

With the resulting links table there is no way to determine which id(id1 or id2) belongs to which table(table1 or table2)

This is being caused by the following which code rearranges id1 and id2 in the Link class constructor. if (id1.compareTo(id2) < 0) { this.id1 = id1; this.id2 = id2; } else { this.id1 = id2; this.id2 = id1; }

Will it be possible to make this behavior configurable.

.

spv3 avatar Mar 05 '15 00:03 spv3

Yes, this is the behaviour. The code swaps the IDs to avoid duplicate links in deduplication mode. Of course, that doesn't work so well in record linkage mode.

We can work on making it predictable which ID goes in what slot (id1 or id2). However, the easiest solution is to just add a prefix to the IDs, so that you know which one comes from which source.

larsga avatar Mar 05 '15 08:03 larsga