'Safe printing' format is ambiguous and undocumented, and it should not be used on NUL-delimited output
When printing filenames (and possibly other things), lsof replaces some characters with an escaped form. However, this format is ambiguous:
> touch $'a\x0b'
> touch 'a^K'
> # Open these files, e.g. with less in two other terminals: less $'a\x0b' + less 'a^K'
> lsof -Fn $'a\x0b' 'a^K'
p534276
f4
na^K
p534277
f4
na^K
Even when using the NUL-delimited output – which can safely include any possible character in a filename, since NUL bytes are not allowed there – the escaping still takes place:
> lsof -F0n $'a\x0b' 'a^K' | xxd
00000000: 7035 3334 3237 3600 0a66 3400 6e61 5e4b p534276..f4.na^K
00000010: 000a 7035 3334 3237 3700 0a66 3400 6e61 ..p534277..f4.na
00000020: 5e4b 000a ^K..
There are three issues here, in my opinion:
- The format is ambiguous on
^escapes. - The format is undocumented.
- The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.
Thank you for reporting.
- The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.
I agree. If -F0 is given, lsof should print the filename as-is. I will work on this item.
- The format is undocumented.
I found the following description:
OUTPUT
...
Lsof only outputs printable (declared so by isprint(3)) 8 bit charac‐
ters. Non-printable characters are printed in one of three forms: the
C ``\[bfrnt]'' form; the control character `^' form (e.g., ``^@''); or
hexadecimal leading ``\x'' form (e.g., ``\xab''). Space is non-print‐
able in the COMMAND column (``\x20'') and printable elsewhere.
Is this not enough?
1.The format is ambiguous on ^ escapes.
My understanding of your point out is the format cannot represent ^ itself.
Am I correct?
- Yes, I think that's effectively the issue. Escaping a literal
^with a backslash should be sufficient. (^^won't work as that's used for 0x1e.) - Ah, thank you, I missed that as I was searching for
\netc. directly. Yes, that's fine. The only thing missing from it, as far as I can tell, is that backslashes are also escaped as\\. - Excellent, thank you!