node icon indicating copy to clipboard operation
node copied to clipboard

`new Intl.Segmenter().segment()` causes SEGFAULT with `--with-intl=small-icu` and no runtime ICU data

Open septatrix opened this issue 1 year ago • 3 comments

Version

v20.10.0

Platform

Linux acfbcf435a24 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux

Subsystem

Intl

What steps will reproduce the bug?

  1. Obtain a nodejs build compiled with --with-intl=small-icu.*
  2. Execute new Intl.Segmenter().segment() in the REPL.

*In my case that was the nodejs package from Fedora 39 without nodejs-full-i18n. This can be obtained as follows:

  1. Create a docker/podman container with Fedora 39: podman run --rm -it fedora:39
  2. dnf install -y --setopt=install_weak_deps=False nodejs

How often does it reproduce? Is there a required condition?

Always, as long as no runtime ICU data is present.

What is the expected behavior? Why is that the expected behavior?

NodeJS should fallback to a locale-unaware string separator, not provide Intl.Segmenter at all, or raise a JS exception. The status quo offers no possibility to know if segmentation is safe and calling it results in a segmentation fault and therefore instant termination of the process.

What do you see instead?

A segmentation fault

Additional information

Backtrace
Core was generated by `node'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  v8::internal::JSSegments::Create () at ../../deps/v8/src/objects/js-segments.cc:33
33            segmenter->icu_break_iterator().raw()->clone();                                                                                                                           
[Current thread is 1 (Thread 0x7f0039bc7c40 (LWP 49))]
(gdb) bt
#0  v8::internal::JSSegments::Create () at ../../deps/v8/src/objects/js-segments.cc:33
#1  0x00007f003c910b24 in Builtin_Impl_SegmenterPrototypeSegment () at ../../deps/v8/src/builtins/builtins-intl.cc:1170
#2  v8::internal::Builtin_SegmenterPrototypeSegment () at ../../deps/v8/src/builtins/builtins-intl.cc:1160
#3  0x00007f003c755df6 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit () from /lib64/libnode.so.115
#4  0x00007f003c6c7d1c in Builtins_InterpreterEntryTrampoline () from /lib64/libnode.so.115
#5  0x0000167f10a804e9 in ?? ()
#6  0x000003439a820159 in ?? ()
#7  0x0000000500000000 in ?? ()
#8  0x0000167f10a805b9 in ?? ()
#9  0x000027cb1169d0c9 in ?? ()
#10 0x000027cb1169d0c9 in ?? ()
#11 0x000003439a820159 in ?? ()
#12 0x0000167f10a804e9 in ?? ()
#13 0x0000004900000000 in ?? ()
#14 0x000000e7305257a1 in ?? ()
#15 0x0000000000000002 in ?? ()
#16 0x000000e730525f49 in ?? ()
#17 0x000003439a80ef49 in ?? ()
#18 0x00007ffefb915ea8 in ?? ()
#19 0x00007f003c6c60dc in Builtins_JSEntryTrampoline () from /lib64/libnode.so.115
#20 0x000003439a80eee1 in ?? ()
#21 0x000027cb1169c439 in ?? ()
#22 0x000000e730525f49 in ?? ()
#23 0x000000000000002c in ?? ()
#24 0x00007ffefb915f10 in ?? ()
#25 0x00007f003c6c5e03 in Builtins_JSEntry () from /lib64/libnode.so.115
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

On a Fedora 39 system one should be able to open the coredump with: DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ gdb /usr/bin/node-20 coredump

coredump.zip

septatrix avatar Feb 13 '24 21:02 septatrix

Backtrace:

#0  0x0000fffff68cba28 in v8::internal::JSSegments::Create(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSSegmenter>, v8::internal::Handle<v8::internal::String>) () from /lib64/libnode.so.115
#1  0x0000fffff64bfee0 [PAC] in v8::internal::Builtin_SegmenterPrototypeSegment(int, unsigned long*, v8::internal::Isolate*) ()
   from /lib64/libnode.so.115
#2  0x0000fffff6305964 [PAC] in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit () from /lib64/libnode.so.115

I don't know if this is a V8 bug or if we don't initialize it correctly.

/cc @nodejs/i18n-api

targos avatar Feb 14 '24 07:02 targos

FWIW we strip out ICU break iterator data with small-icu https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu_small.json#L19 https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu_small.json#L34-L40

Also https://bugs.chromium.org/p/v8/issues/detail?id=3345 referenced https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu-generic.gyp#L38-L41

cc @srl295

richardlau avatar Feb 14 '24 16:02 richardlau

FWIW we strip out ICU break iterator data with small-icu

https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu_small.json#L19

https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu_small.json#L34-L40

Also https://bugs.chromium.org/p/v8/issues/detail?id=3345 referenced

https://github.com/nodejs/node/blob/bf3971673593ebd3f1f65b2d15a717f0784379ff/tools/icu/icu-generic.gyp#L38-L41

cc @srl295

In 2014 there wasn't an Intl.Segmenter.. probably these should be included.

However, v8 shouldn't crash, that should be fixed in v8.

We do have some code (IIRC) that detects small-icu and monkeypatched Intl with a warning about small icu.

srl295 avatar Feb 14 '24 16:02 srl295

Apart from the PR for a regression test is someone also working towards fixing this? Will this be solved by adjusting the ICU configuration in NodeJS to include a locale-unaware fallback, must this be fixed in V8, or both? (In the case that actions are required upstream it would be great to link a ticket)

septatrix avatar Mar 09 '24 17:03 septatrix

Apart from the PR for a regression test is someone also working towards fixing this? Will this be solved by adjusting the ICU configuration in NodeJS to include a locale-unaware fallback, must this be fixed in V8, or both? (In the case that actions are required upstream it would be great to link a ticket)

sorry i did not take a look at this more yet

a couple of different concerns:

  1. fixing v8 to not crash- that needs to be filed with v8.
  2. fixing node to include some segmentation data even with small-icu: Not clear we should do this. small icu wont' be small if it includes all segmentation data.
  3. remove the "TODO reenable UCONFIG_NO_BREAK_ITERATION=1" - this comment should be removed.

srl295 avatar Mar 10 '24 02:03 srl295

a couple of different concerns:

  1. fixing v8 to not crash- that needs to be filed with v8.

If you point me to where this can be done (monorail?) I could do this though I am not sure which information to provide them so it might make more sense for some of you to do this.

  1. fixing node to include some segmentation data even with small-icu: Not clear we should do this. small icu wont' be small if it includes all segmentation data.

I was thinking of a dummy implementation similar to the other i18n functionality (which to my knowledge uses either very simplistic/naive implementatios or ships only with the data required for the english locale. I.e. simply split grapheme s at utf8 boundaries, words at spaces/punctuation and sentences at a full stop, question/exclamation mark etc. Alternatively the object should not exist on Intl which would at least allow detection but once browser support is ubiquitous, libraries would drop that which (or newer ones never check it) which will result in errors.

  1. remove the "TODO reenable UCONFIG_NO_BREAK_ITERATION=1" - this comment should be removed.

I have no clue about this

septatrix avatar Mar 10 '24 02:03 septatrix

Alternatively the object should not exist on Intl

we remove(d) it in some circumstances. Again i need to dig that out.

srl295 avatar Mar 10 '24 02:03 srl295

Are you able to provide more information on where this currently sits in the backlog and when this might get resolved?

septatrix avatar Aug 12 '24 22:08 septatrix

I have not been able to look further into this myself.

srl295 avatar Aug 12 '24 23:08 srl295