Undetected duplicate person in persons_finish_unfinished.php
@SAuroux posted some results after I merged #3127, and he noticed something unexpected in the php script.
The new CompetitionResultsValidator would report a possible duplicate person for "Hoàng Đức Tài" (2017TAIH01), which is right, but the persons_finish_unfinished.php misses that and reports no possible duplicate.
Maybe there is something wrong with the encoding in the php script?
It would be nice to either fix it in the php script, or port the newcomers script to rails!
Since Sébastien couldn't select the existing person in the php script, he worked around it by creating a new WCA ID and merging the two persons afterward. Interestingly, at this point rails wouldn't want to merge the two person because they had different names. The two strings are "Hoàng Đức Tài" 2017TAIH01 and "Hoàng Đức Tài", I haven't yet taken the time to investigate this.
I've investigated that a bit more, it turns out the reason for why the two scripts behave differently is due to how they get the list of similar persons.
The rails scripts does a Person.where(name: var), whereas the php script does a SQL select for all persons, then relies on the php function similar_text to rank the candidates.
A few facts:
- the two names in the opening post match both in php and in rails.
Person.find_by(wca_id: "2017TAIH01").name == "Hoàng Đức Tài"fails in railsPerson.where(name: "Hoàng Đức Tài")does return the correct Person despite the previous failure
The bytes differ:
irb(main):036:0> Person.find_by(wca_id: "2017TAIH01").name.each_byte { |l| puts l }
72
111
97
204
128
110
103
32
196
144
198
176
204
129
99
32
84
97
204
128
105
irb(main):037:0> "Hoàng Đức Tài".each_byte { |l| puts l }
72
111
195
160
110
103
32
196
144
225
187
169
99
32
84
195
160
105
I have no idea what happens under the hood of Person.where, nor how we should handle this issue :(
I'm adding this to https://github.com/thewca/worldcubeassociation.org/projects/4.