server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-9473 Enhance Soundex Functionality to Properly Handle Multi-word Inputs

Open bardiaHSZD opened this issue 9 months ago • 4 comments
trafficstars

Description

This merge request fixes the SOUNDEX() function's handling of multi-word inputs to comply with standard specifications and improve SQL compatibility.

The current implementation incorrectly processes strings with multiple words:

  • Words are concatenated instead of being processed separately
  • Output exceeds the standard 4-character length
  • No proper word delimitation in output

There are two flavours for this implementation:

  • Flavour 0: Example: SOUNDEX('Hello World') → 'H4643', or SOUNDEX('Hello World', '0') → 'H4643'

  • Flavour 1:
    Example: SOUNDEX('Hello World','1') → 'H400 W643'

Changes:

  1. Code Changes:

    • item_strfunc.h: Isolated single-word logic in Item_func_soundex::soundex
    • item_strfunc.c:
      • Refactor single-word processing
      • Add multi-word support in Item_func_soundex::val_str
      • Improve formatting
      • Use String data-type, from the data base implementation, instead of std::string
  2. Test Updates:

    Modified expected results for multi-word scenarios in:

    • ctype_ucs
    • ctype_utf16
    • ctype_utf16le
    • ctype_utf32
    • func_str

How can this PR be tested?

The following MTRs passed after the code changes:

  • ctype_ucs
  • ctype_utf16
  • ctype_utf16le
  • ctype_utf32
  • func_str

Basing the PR against the correct MariaDB version

Backward compatibility

Copyright

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

bardiaHSZD avatar Feb 11 '25 22:02 bardiaHSZD