AndBug
AndBug copied to clipboard
vm unpackString does not tolerate invalid unicode
Due to how Python handles unicode, vm.unpackString can fail with decoding errors when invalid codepoints are received from Dalvik. For forensic use, this is terrible behavior.
To support this, a wrapper class should be written that preserves this data as a bytestring and presents Python unicode strings with omitted invalid codepoints as a str to prevent dependent functions from throwing exceptions when operating on derived data.
See also how vm unpacks chr's:
https://github.com/swdunlop/AndBug/commit/9ae4bd28ec8bc1f4e0bd60fd988c8931072ffa6c#commitcomment-1426109