plaso
plaso copied to clipboard
bodyfile parser: handle unpaired surrogates
NTFS file names can contain unpaired surrogates. It is currently unclear how these should be represented in the format https://github.com/sleuthkit/sleuthkit/issues/2837, however the Python Unicode implementation rejects unpaired surrogates as invalid Unicode.
Maybe the best way is to escape them, for example as "\ud800"