lsof icon indicating copy to clipboard operation
lsof copied to clipboard

'Safe printing' format is ambiguous and undocumented, and it should not be used on NUL-delimited output

Open JustAnotherArchivist opened this issue 5 years ago • 2 comments

When printing filenames (and possibly other things), lsof replaces some characters with an escaped form. However, this format is ambiguous:

> touch $'a\x0b'
> touch 'a^K'
> # Open these files, e.g. with less in two other terminals:  less $'a\x0b'  +  less 'a^K'
> lsof -Fn $'a\x0b' 'a^K'
p534276
f4
na^K
p534277
f4
na^K

Even when using the NUL-delimited output – which can safely include any possible character in a filename, since NUL bytes are not allowed there – the escaping still takes place:

> lsof -F0n $'a\x0b' 'a^K' | xxd
00000000: 7035 3334 3237 3600 0a66 3400 6e61 5e4b  p534276..f4.na^K
00000010: 000a 7035 3334 3237 3700 0a66 3400 6e61  ..p534277..f4.na
00000020: 5e4b 000a                                ^K..

There are three issues here, in my opinion:

  1. The format is ambiguous on ^ escapes.
  2. The format is undocumented.
  3. The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.

JustAnotherArchivist avatar Nov 09 '20 19:11 JustAnotherArchivist

Thank you for reporting.

  1. The format should not be used on NUL-delimited output for programs because it significantly and needlessly complicates the downstream parsing.

I agree. If -F0 is given, lsof should print the filename as-is. I will work on this item.

  1. The format is undocumented.

I found the following description:

OUTPUT
...
       Lsof  only  outputs printable (declared so by isprint(3)) 8 bit charac‐
       ters.  Non-printable characters are printed in one of three forms:  the
       C  ``\[bfrnt]'' form; the control character `^' form (e.g., ``^@''); or
       hexadecimal leading ``\x'' form (e.g., ``\xab'').  Space is  non-print‐
       able in the COMMAND column (``\x20'') and printable elsewhere.

Is this not enough?

1.The format is ambiguous on ^ escapes.

My understanding of your point out is the format cannot represent ^ itself. Am I correct?

masatake avatar Nov 09 '20 22:11 masatake

  1. Yes, I think that's effectively the issue. Escaping a literal ^ with a backslash should be sufficient. (^^ won't work as that's used for 0x1e.)
  2. Ah, thank you, I missed that as I was searching for \n etc. directly. Yes, that's fine. The only thing missing from it, as far as I can tell, is that backslashes are also escaped as \\.
  3. Excellent, thank you!

JustAnotherArchivist avatar Nov 10 '20 00:11 JustAnotherArchivist