TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

TileDB library is not locale safe

Open jjimenezshaw opened this issue 1 year ago • 1 comments

(Fully explained in https://github.com/PDAL/PDAL/pull/3873 )

Many C/C++ functions like atof, strtod, to_string, stringstream, etc do take into consideration the C locale (in both directions: number to string and string to number). The C locale is a global setting that can be set with a function like std::locale::global(std::locale("de_DE.utf8")); (in this example for the German locale... that has to be installed in the OS at execution time). https://en.cppreference.com/w/cpp/locale/locale/global

If the application that uses tiledb is setting a global locale (for any reason), then maybe TileDB is not working as expected. Just add std::locale::global(std::locale("de_DE.utf8")); at the beginning of test/src/unit.cc, and you will see that tests are failing. Please, notice that you need the German locale de_DE.utf8 installed in your system at run time.

That happens with many locales of languages that use , (comma) as decimal separator and . as thousand separator, like German, Spanish, French, Dutch, etc.

See that the intention is not making TileDB locale aware (that is, to use the locale of the system), but the other way around: regardless the locale configured, the library should work the same way (probably reading and writing numbers in the classic locale). Unfortunately many C/C++ formatting functions do not help on that :(

./tiledb/test/tiledb_unit 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tiledb_unit is a Catch v2.13.8 host application.
Run with -? for options

-------------------------------------------------------------------------------
Backwards compatibility: Write to an array of older version
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:678
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:720: FAILED:
  REQUIRE( a_read[0] == 100 )
with expansion:
  1 == 100

-------------------------------------------------------------------------------
Backwards compatibility: Upgrades an array of older version and write/read it
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.220
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.279: FAILED:
  CHECK( fragment_info.version(0) == 1 )
with expansion:
  16 == 1

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.281: FAILED:
  CHECK( fragment_info.version(1) == tiledb::sm::constants::format_version )
with expansion:
  1 == 16

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.304: FAILED:
  REQUIRE( a_read2[i] == i + 11 )
with expansion:
  1 == 11

-------------------------------------------------------------------------------
C API: Test opening array at timestamp, reads
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:865
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:997: FAILED:
  CHECK( rc == 0 )
with expansion:
  -1 == 0

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:1.115: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

===============================================================================
test cases:  18 |  15 passed | 3 failed
assertions: 670 | 664 passed | 6 failed

Segmentation fault (core dumped)

Those tests worked perfectly in my computer (checkout of dev in Ubuntu 20.04) without setting the global locale to German.

See the PR I made in PDAL to see a clear example and a solution.

Unfortunately I was not able to fix it in TileDB. I do not know how the format to and from string is done in TileDB.

IMHO the proper solution is to use a dedicated parser that does not consider the locale (as many JSON parsers are doing), or use the imbue function in C++ streams with std::locale::classic(), as I did in PDAL. Changing the C locale in a library is not a good idea (it is a global variable), as it is not threadsafe, and can seriously impact the application threads (even when TileDB is run singlethreaded, there can be other threads from the main application).

jjimenezshaw avatar Oct 09 '22 10:10 jjimenezshaw