ClearScript
ClearScript copied to clipboard
Add a way to transfer JavaScript data to .NET faster than via JSON
Hello,
Great library!
But I make calls to APIs and build data in the form of a table that I return to the .NET host.
The format of the data is something like {[key: string]: string | number | bool | null | undefined}[]
This format is straight forward to convert into a DataTable but the performance is abysmal due to slow read of property data.
I guess it is due to the following code in the GetProperty call:
engine.MarshalToHost(engine.ScriptInvoke(() => target.GetProperty(index)), false);
Is there a way to tell the engine to return somthing like Array<Dictionary
If not, are there workarounds or plans to solve this?
Hi @AllNamesRTaken,
Thanks for your kind words!
If we understand correctly, you're building an array of dictionaries in script code and passing it to a .NET method for processing. The .NET method uses a separate dynamic call to retrieve each dictionary entry. This results in poor performance due to a large number of hops across the host-script boundary.
If that's correct, our first suggestion would be as follows. In script code, once you've constructed your data, transform it into an array of JSON strings using JSON.stringify
. Pass the result to your .NET method. On the .NET side, use a library like Newtonsoft.Json to deserialize each JSON string into a .NET dictionary (see here).
If performance is still not up to par – for example, if your array is so large that retrieving each JSON string separately is a problem – you should be able to convert all the data into one JSON string and pass it to .NET in a single hop.
It's also possible that your data is so large that converting it all into a JSON string is impossible. In that case, you might have to get more creative. One possibility might be to use a JavaScript typed array as your data transfer medium. ClearScript offers fast copying to and from typed arrays.
Thoughts?
Thank you for responding so quickly. I believe you hit the nail on it's head. The JSON.stringify + json.net convert is the solution i use as a workaround. But for large tables it is still a costly solution when compared to the speed of V8, which is why i had hoped for a solution where i.e. leafs / object properties with basic data types could have their values automatically converted. That way the marshaling would only happen per row in a datatable (edit. js object array).
Is that of interest or a bad idea? [EDIT] words are difficult :)
Hi @AllNamesRTaken,
That way the marshaling would only happen per row
Please clarify. Our understanding is that each dictionary in the source data ends up being a row in the final table. If that's correct, and if you're transferring each dictionary as a JSON string, you're already marshaling only once per row.
But for large tables it is still a costly solution when compared to the speed of V8
Since you're starting with a pure JavaScript data structure, marshaling is unavoidable.
As a general principle, ClearScript favors proxy marshaling over automatic conversion. The goal is to avoid data loss, but as you've noticed, accessing data across the host-script boundary is expensive.
In performance-sensitive scenarios such as yours, where that expense is unacceptable, the best solution is to alter your data access patterns in order to minimize hops across that boundary. By switching to JSON strings for the dictionaries, you've taken a big step in that direction (incidentally, we'd love to get an idea of the improvement it yielded over your original approach).
Beyond that, here are some ideas for further gains:
-
Transfer your entire array as a single JSON string. The performance gain is likely to depend on the array size.
-
Can you think of any way to optimize your data pre-transfer? For example, do the rows in your final table use a common schema? If so, it may be more efficient to transfer the data as a list of value arrays rather than dictionaries.
-
Use a format that's more efficient than JSON. Standard JavaScript only supports JSON, but there are libraries out there for things like UBJSON and MessagePack. If you serialized your data to a binary buffer (Uint8Array et al), ClearScript could transfer it to a .NET array very efficiently. On the other hand, it's possible that V8's native JSON serializer is faster than anything one could code in JavaScript, even if the resulting data is larger.
Finally, as you suggested, ClearScript could natively implement some form of fast structured data transfer. That's something we've had on our backlog for a while, and we'll definitely look into it for the next release.
Please send any additional thoughts or findings!
Thanks!
Hi, The data was about 2400 rows long and 19 columns wide. We returned it as a js array of objects, with each row being one js object with 19 keys. Values were a mix of number and string types.
The original solution was to loop over the dynamic object representing the array and creating DataRows in a DataTable, one per dynamic representing the row object. Then per row looping over the keys and filling the cells with values. It was obvoius from VS perfomance metrics that the property access was the performance drain. Total time on my machine debugging, though fluctuating, for this method was about 200- 300ms.
The current method is to JSON.serialize the whole array of objects in V8 and the do a Deserialize to Datatable with json.net. Total time in debug mode for json.net was about 50ms, pretty deterministic.
For 2400×19 cells this feels slow. Optimally would be iterating over the data as a pure array of dictionaries or something similar but i can understand the goal of not marshalling automatically to avoid errors from type conversion. But many usecases are not in the risk zone of these errors.
The middleground i was pondering was to automatically translate basic value properties such as string number,null and bool on objects. This would then reduce the need to marshall on every propery access when transforming into a datatable. If you wish to still allow for pure access to v8 types there could be a separate function on the clearscript object to access the pre translated data which would fallback to marshalling if nu value was found.
This would reduce the time in my example close to 20 times for a more reasonable total of 10-15ms. The wet dream ofc would be a way to ask for automatic translation of arrays as well which would allow for very large data sets at low overhead.
Regards Joel
Hi Joel,
Thanks for providing that information! We'll run some experiments and get back to you. Hopefully your current JSON-based solution is good enough in the short term.
The middleground i was pondering was to automatically translate basic value properties such as string number,null and bool on objects. This would then reduce the need to marshall on every propery access when transforming into a datatable.
It would appear that, by "translate", you mean transfer all primitive-valued properties during the initial proxy handshake, so that retrieving them no longer requires a round trip to the script engine. Is that correct? If so, it's an interesting idea; the problem is that access to those properties would no longer be "live", and the proxy could diverge from the original JavaScript object unless some coherency protocol were in place.
Quick question: We'd like to provide some kind of fast structured data transfer, but for practical reasons we're thinking of limiting it to JSON-compatible data. In your case, that would exclude undefined
as a legal property value. Would that be a big problem for you?
Thanks again!
Yes you understood me right and yes i see the issue with stale access to underlying data. I just believe the usecase to be common where you evaluate to retrieve some object value to continue processing in .net. Perhaps a parameter to the evaluate function or a separate call that do not provide live access would be feasable?
Yes that would be enough for me with non-undefined basic types. My use case is based around retrieving and processing larger data load over rest apis which return json structures without the undefined type anyhow.
JSON serialization is doable for me in the short term but I would gladly test any impovements that come at a later stage.
Many thanks for a positive discussion. If I can be of any assistance with development or testing please let me know.
Regards Joel
Den fre 30 apr. 2021 22:36ClearScript Library @.***> skrev:
Hi Joel,
Thanks for providing that information! We'll run some experiments and get back to you. Hopefully your current JSON-based solution is good enough in the short term.
The middleground i was pondering was to automatically translate basic value properties such as string number,null and bool on objects. This would then reduce the need to marshall on every propery access when transforming into a datatable.
It would appear that, by "translate", you mean transfer all primitive-valued properties at proxy construction, so that retrieving them no longer requires a round trip to the script engine. Is that correct? If so, it's an interesting idea; the problem is that access to those properties would no longer be "live", and the proxy could diverge from the original JavaScript object unless some coherency protocol were in place.
Quick question: We'd like to provide some kind of fast structured data transfer, but for practical reasons we're thinking of limiting it to JSON-compatible data. In your case, that would exclude undefined as a legal property value. Would that be a big problem for you?
Thanks again!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/ClearScript/issues/254#issuecomment-830367257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADVCORIDSFOIO4U6GJCEJDTLMIDBANCNFSM432E6XFQ .
Hi again @AllNamesRTaken,
We've now run a bunch of tests with randomly generated JavaScript data similar in "shape" to the data in your scenario. Our goal was to find a way to transfer it more rapidly than your current solution involving JSON.stringify
and JsonConvert.DeserializeObject
.
Unfortunately, we've had to abandon this effort. The fundamental problem is that V8's public C++ API incurs enough overhead to offset any performance gain we can get from using a better format than JSON. In fact, in our tests, JSON.stringify
consistently produced its results in less time than it took ClearScript to simply iterate the data via the public API.
Interestingly, we then found that JavaScript code could iterate the data much faster than C++, but script-based serialization was much slower. So we tried a hybrid solution, where a JavaScript function iterated the data and called out to C++ for serialization. That approach managed to pull out a slight win over JSON.stringify
, but only as long as the data was all numeric. When we added strings to the mix, JSON.stringify
again won easily.
We'll keep this issue open as a future enhancement, and we'll watch for new V8 APIs that might help. In the meantime, please don't hesitate to send any additional thoughts or findings our way.
Thank you!
This is such in interesting discussion! Have you guys tried https://github.com/lucagez/slow-json-stringify or https://github.com/fastify/fast-json-stringify?
Did you also try Protobuf?
Thanks @promontis! We haven't tested any external serialization libraries on the JavaScript side, but we certainly encourage such experimentation, and we welcome any findings.