rbx-dom
rbx-dom copied to clipboard
Improved error messages for instance names
Currently, rbx-dom throws a pretty nondescript error when an instance has a name containing invalid utf-8:
[TRACE rbx_binary::deserializer::state] Known prop, canonical name C0 and type CFrame
[TRACE rbx_binary::chunk] Chunk "PROP" (compressed: 24, len: 37, reserved: 0)
[TRACE rbx_binary::deserializer::state] PROP chunk (RotateP.C1, instance type 59, prop type 59
[TRACE rbx_binary::deserializer::state] Known prop, canonical name C1 and type CFrame
[TRACE rbx_binary::chunk] Chunk "PROP" (compressed: 20, len: 18, reserved: 0)
[TRACE rbx_binary::deserializer::state] PROP chunk (RotateP.Enabled, instance type 59, prop type 59
[TRACE rbx_binary::deserializer::state] Known prop, canonical name Enabled and type Bool
[TRACE rbx_binary::chunk] Chunk "PROP" (compressed: 30, len: 50, reserved: 0)
[TRACE rbx_binary::chunk] Chunk "PROP" (compressed: 69, len: 103, reserved: 0)
[TRACE rbx_binary::deserializer::state] PROP chunk (RotateP.Name, instance type 59, prop type 59
stream did not contain valid UTF-8
This particular instance is more common in Roblox than one might think and is known as a "virus" instance, one created by a malicious script or plugin to be intentionally annoying or disruptive. Developers that get this error generally don't know that they have these, and the instances are also generally harmless, but make deserialization in rbx-dom fail with the above error message.
It would be nice to throw a better error here (and maybe in other cases for invalid utf-8), maybe even mentioning the exact instance and where it is so developers can deal with it accordingly.
Looked into this more, and it's going to be difficult (or impossible) to give much useful info (e.g. the instance's full name) in these errors because the DOM is necessarily incomplete in the middle of a deserialization operation.
Maybe instead of more specific errors, we could just do a lossy conversion for non UTF-8 Variant::String properties? The main drawback is just the fact the original data would be altered, and there may be legitimate uses for non UTF-8 string properties!
Alternatively, we could start reading Variant::String props into a Vec<u8> instead of a String. This would be more correct, but it would be a breaking change and makes string properties more complicated to work with for consumers.
d��������������ng…i got owned…
Alternatively, we could start reading Variant::String props into a Vec
instead of a String. This would be more correct, but it would be a breaking change and makes string properties more complicated to work with for consumers.
This actually sounds great for Lune as a consumer since we could, in many cases, skip the entire conversion step between dom string -> rust string -> lua (c) string. We could also unify some code we have for differences in handling with normal strings / binary strings. I can't really speak for Rojo or any other tools (are there any other than Remodel/Lune?) though.