hermes icon indicating copy to clipboard operation
hermes copied to clipboard

[JSI] Support non-copy `jsi::String` constructors

Open mrousavy opened this issue 11 months ago • 8 comments

Problem

Currently, all jsi::String factory methods (e.g. createFromAscii(...)) copy the given std::string/char*. This could cause unnecessary performance hits if the strings are large, or constexpr/statically known at compile-time

Solution

It would be cool to have a constructor/factory method that doesn't copy the given std::string/std::string_view/char*/char[N]/jsi::ArrayBuffer.

This would be beneficial in two situations:

  • For static/constexpr strings that do not change
  • For huge strings (like JSON strings) that are memory-managed elsewhere (aka "unsafely") to be passed to JS directly and synchronously, and then freed later. This avoids large copies in situations where performance could matter.

Additional Context

I don't know how this works internally in the Runtime, but is it even possible for the Hermes/JSC runtime to not have ownership on a string? I'm guessing strings are not represented using the C++ STL internally, so a conversion from std::string will always happen - but what about char*/char[N]?

mrousavy avatar Jan 21 '25 14:01 mrousavy

Such functionality doesn't currently exist, but we have discussed in the past. As you've described, the string provided through JSI is currently always copied.

One way this could be provided is to add a constructor of jsi::String that accepts a std::shared_ptr<jsi::Buffer> similar to what we have for jsi::ArrayBuffer. The jsi::Buffer implementation can then consume the string from any source you choose.

I'd be curious to hear more about your use case for such an API. For instance, what are the encodings that you would need it to support? Hermes internally only supports ASCII and UTF-16 strings, so supporting something like this for UTF-8 would be much more challenging.

neildhar avatar Jan 21 '25 22:01 neildhar

Internally we support taking ownership of std::string and std::u16string, but this functionality isn't exposed through JSI. We could expose it, possibly using the Buffer API, as @neildhar mentioned. It would only make sense for very large strings (I haven't quantified it, but certainly over a 1 KB, perhaps even 16KB). Also note that for char strings there would be cost to validate that the contents are ASCII when "adopting" the string into the VM.

is it even possible for the Hermes/JSC runtime to not have ownership on a string

I can't speak for JSC, but in Hermes the question is moot, since "ownership" doesn't actually mean anything in C++. You control the pointer and the destructor of the jsi::Buffer or the deleter of std::shared_ptr<>. The only requirement for correctness is that the contents of the buffer must not change while the VM "believes" that it owns it.

tmikov avatar Jan 22 '25 01:01 tmikov

Hermes internally only supports ASCII and UTF-16 strings, so supporting something like this for UTF-8 would be much more challenging.

In my case I have a few ASCII constants that I'd love to micro-optimize.

I also had a scenario where I parsed a huge (around 10 KB) JSON file and passed it around between native and JS - I found a way to use ArrayBuffer for this instead, but it made the API a bit more annoying on the JS side.

mrousavy avatar Jan 22 '25 09:01 mrousavy

In my case I have a few ASCII constants that I'd love to micro-optimize.

If they are constants they are likely small? Wouldn't it make more sense to add them to the VM once? The heap allocation of the Buffer and the shared_ptr is likely more expensive than copying of a small string.

I also had a scenario where I parsed a huge (around 10 KB) JSON file and passed it around between native and JS - I found a way to use ArrayBuffer for this instead, but it made the API a bit more annoying on the JS side.

Hmm. I have two questions here. First, 10KB doesn't sound that large. Are you passing many of these to the VM with high frequency? Second, how certain are you that it is ASCII? Usually it is risky to assume that a file is ASCII.

tmikov avatar Jan 22 '25 20:01 tmikov

Hm yea good point the strings aren't as big.

I thought about the use-case of embedding translation strings, but maybe it's better to keep those separate assets anyways.

And no, the JSON is not ASCII - I'm guessing it's UTF8.

I think you're right tho this is probably not a very big performance gain anyways, might've been a nice to have feature with the added risk of the unsafe ownership

mrousavy avatar Jan 22 '25 23:01 mrousavy

Hey - so I've been playing around with the jsi::String::getStringData(…) API you guys have added and it's quite a powerful API: https://github.com/mrousavy/react-native-mmkv/pull/806

Avoiding a copy here gave me a 84% performance improvement, which was between 4%-46% overall, depending on the string's format (UTF8 vs UTF16) (incl. the function calls).

I was wondering; since my use-case is to simply store the string data to a file, and later read it back again - would it be possible to have the inverse as well?

// Copies the `data` into a new `jsi::String`, either UTF8 or UTF16 (`ascii`)
jsi::String::createFromData(bool ascii, const void* data, size_t length)

This would allow me to read string data as is, dump it into storage, and then convert it back to a string without caring about it's format (no UTF8 or UTF16 conversions) and leaving it up to the JS runtime to handle that.

The only case where I would consider this unsafe is if a string consists of both UTF8 (ascii=true) and UTF16 (ascii=false) parts, but I don't think that can ever be the case, can it? In other words; jsi::String::getStringData(..., Cb) being called multiple times with different values for bool ascii.

mrousavy avatar Feb 20 '25 12:02 mrousavy

I was wondering; since my use-case is to simply store the string data to a file, and later read it back again - would it be possible to have the inverse as well?

// Copies the data into a new jsi::String, either UTF8 or UTF16 (ascii) jsi::String::createFromData(bool ascii, const void* data, size_t length)

This would allow me to read string data as is, dump it into storage, and then convert it back to a string without caring about it's format (no UTF8 or UTF16 conversions) and leaving it up to the JS runtime to handle that.

How is this different from using the existing APIs:

static String createFromAscii(Runtime& runtime, const char* str, size_t length);
static String createFromUtf16(Runtime& runtime, const char16_t* utf16, size_t length);

tmikov avatar Feb 22 '25 20:02 tmikov

Ah my bad - I was on react-native 0.78, which does have getStringData(...), but does not have createFromUtf16(...) yet.

It is on react-native main though, so I'll just wait for react-native 0.79 then.

mrousavy avatar Feb 24 '25 21:02 mrousavy