bstr
bstr copied to clipboard
Write escaped string into a buffer
Hi @BurntSushi,
I'm using bstr
for turning a Vec<u8>
-like structure into debug strings and error messages. Specifically, I'm working on a Ruby implementation. In Ruby String
is a Vec<u8>
with a default UTF-8 encoding with no guarantees that the bytes are actually valid UTF-8.
bstr
is the means by which I interpret these byte vectors as UTF-8 the best I can.
The fmt::Debug
implementation on &BStr
is very close to what I'd like, but I cannot use it because it wraps the escaped string in quotes. I need control of the output since these strings are being but into error messages.
I've put together this function for writing the escaped representation to an arbitrary fmt::Write
(cribbing heavily form the fmt::Debug
impl on &BStr
).
pub fn escape_unicode<T>(mut f: T, string: &[u8]) -> Result<(), WriteError>
where
T: fmt::Write,
{
let buf = bstr::B(string);
for (start, end, ch) in buf.char_indices() {
if ch == '\u{FFFD}' {
for byte in buf[start..end].as_bytes() {
write!(f, r"\x{:X}", byte)?;
}
} else {
write!(f, "{}", ch.escape_debug())?;
}
}
Ok(())
}
Here's an example usage:
let mut message = String::from("undefined group name reference: \"");
string::escape_unicode(&mut message, name)?;
message.push('"');
Err(Exception::from(IndexError::new(interp, message)))
I'm trying to generate a message like this:
$ ruby -e 'm = /(.)/.match("a"); m["abc-\xFF"]'
Traceback (most recent call last):
1: from -e:1:in `<main>'
-e:1:in `[]': undefined group name reference: "abc-\xFF" (IndexError)
Is this patch something you would consider upstreaming?
This looks reasonableish, yes. I'd like to see its API cleaned up a bit. Namely:
- It looks like it should be named
escape_debug
instead ofescape_unicode
? Namely,escape_unicode
in std converts everything to Unicode escapes. - I think it should be named
escape_debug_to
since it writes to afmt::Write
. This leaves the door open to addingescape_debug
implementations that mirror std, but this doesn't need to be in the initial PR. - Add docs along with an example, consistent with the rest of the API. :-)
Thanks for the good idea!
Thanks. I’ll work on a PR tonight.
Apologies of leading you down the wrong path here, but as noted in #37, I think we should add APIs that mirror std for this as closely as possible. In particular, we should be able to have an escape_debug
method that returns an iterator of char
values corresponding to the escaped output. The iterator itself can implement fmt::Write
for ergonomics.
This is harder to implement, but I think looking at std should give some inspiration. Note that there is an important difference between bstr and std here. std has an escape_debug
impl for char
, and since a str
is just a sequence of encoded char
s, its str::escape_debug
method can simply defer to the char
implementation. We can't really do that in bstr, so the implementation will need to be a bit different.
I'm sharing this because I believe that it's a step towards @BurntSushi 's proposed solution (just needs mapping from DebugItem -> Iterator<Item=char>
, but is also useful for those that want a non-escaped debug string.
enum DebugItem<'a> {
NullByte,
Escaped(core::char::EscapeDebug),
HexedChar(char),
HexedBytes(&'a [u8]),
}
impl<'a> std::fmt::Display for DebugItem<'a> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
DebugItem::NullByte => write!(f, "\\0"),
DebugItem::Escaped(escaped) => write!(f, "{}", escaped),
DebugItem::HexedBytes(bytes) => {
for &b in bytes.as_bytes() {
write!(f, r"\x{:02X}", b)?;
}
Ok(())
},
DebugItem::HexedChar(ch) => write!(f, "\\x{:02x}", *ch as u32),
}
}
}
fn iter_debug_items<'a>(debug_str: &'a BStr) -> impl Iterator<Item = DebugItem<'a>> {
debug_str.char_indices()
.map(|(s, e, ch)| {
match ch {
'\0' => DebugItem::NullByte,
'\u{FFFD}' => {
let bytes = debug_str[s..e].as_bytes();
if bytes == b"\xEF\xBF\xBD" {
DebugItem::Escaped(ch.escape_debug())
} else {
DebugItem::HexedBytes(bytes)
}
}
// ASCII control characters except \0, \n, \r, \t
'\x01'..='\x08'
| '\x0b'
| '\x0c'
| '\x0e'..='\x19'
| '\x7f' => {
DebugItem::HexedChar(ch)
}
'\n' | '\r' | '\t' | _ => {
DebugItem::Escaped(ch.escape_debug())
}
}
})
}
impl fmt::Debug for BStr {
#[inline]
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "\"")?;
for item in iter_debug_items(self) {
write!(f, "{}", item)?;
}
write!(f, "\"")?;
Ok(())
}
}