wikitree-browser-extension icon indicating copy to clipboard operation
wikitree-browser-extension copied to clipboard

Soundex Support

Open harrislineage opened this issue 3 years ago • 8 comments

From a discussion during the WikiTree Day: The Future of Genealogy Discussion Panel II, look at adding support/display of Soundex encoding.

harrislineage avatar Nov 05 '22 13:11 harrislineage

Better use this variant https://en.m.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex

Standard soundex is unusable for non-English names.

MichalVasut avatar Nov 07 '22 16:11 MichalVasut

Better use this variant https://en.m.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex

Standard soundex is unusable for non-English names.

I am unfamiliar with Soundex usage in other countries, but the Daitch–Mokotoff Soundex would be unusable for US purposes (i.e., US Census), so could not be used as the sole algorithm for encoding. Perhaps I can add an options set to show:

  • Soundex
  • Daitch–Mokotoff Soundex
  • Cologne phonetics
  • or any combination of above

Do you think this would suffice?

harrislineage avatar Nov 07 '22 17:11 harrislineage

Okay, I don't really care. 😅 I've just read about this thing for the first time, so I've checked the Wiki, what ot actually is. There was mentioned that standard / original is only for English words and there are some improvements including the one, I've mentioned. I originally thought that it's improvement (enhancement) on the top of original one, not that it's specified on different languages. 🤔 But if that's the case just use the original one...

MichalVasut avatar Nov 07 '22 17:11 MichalVasut

No worries! I think I may add in the options later... The original Soundex encoding was used by the US Government for the US Census. In many cases, you can search information using the Soundex code to find microfilm publication number and roll number for the results. Obviously, this can't work for every language, hence the different algorithms mentioned.

harrislineage avatar Nov 07 '22 17:11 harrislineage

The Steven Morse website also has a discussion of various Soundex and similar algorithms. See https://stevemorse.org/#phonetic

I did use Soundex variants when I was trying to look for possible variants of my great grandfather's surname. But then again I was familiar with Soundex because of its use by the US Government (both for the census and within other mapping organizations).

I think adding Soundex is a great idea. Of course, we might need to explain it to many users.

ke4tch avatar Nov 08 '22 16:11 ke4tch

I think adding Soundex is a great idea. Of course, we might need to explain it to many users.

Oh yes, this would be a very specialized feature for the more advanced genealogists, not just the advanced WikiTreers. I may need to write out some information on a Free-space and link to that info.

harrislineage avatar Nov 08 '22 16:11 harrislineage

If I understand it correctly - these things are supposed to be used for search of similar words / names based on how they sound. This is somehow encoded (depending on chosen algorithm) and searched word is then matched against the database and it returns all matches that have the same soundex encoding.

But how do you use it in client side browser extension? Does it have another use?

MichalVasut avatar Nov 08 '22 16:11 MichalVasut

But how do you use it in client side browser extension? Does it have another use?

The extension, right now, is just performing the encoding and making the Soundex available to members. The Soundex can then be used in records searches, such as in the New York, New York, Soundex to Passenger and Crew Lists, 1887-1921.

If you were to search for Harris you will only get three results. But using the Soundex (in this case, H620), you can see other possible matches, such as the record of Charles F. Harss who has the same Soundex. I haven't researched this individual and only pulled it up as a quick example, but this could possibly be a misspelling of Harris and I wouldn't know if I had just searched for Harris as above. If you scroll back and forth 1 record in each direction from Charles, you will see his Soundex has him sandwiched in between two Harris' :)

harrislineage avatar Nov 08 '22 16:11 harrislineage