qBittorrent Sanitize peer client names

Clients can send arbitrary UTF-8 strings as their client identificators. qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

This PR introduces basic sanitization for client strings.

Source string is simplified.
Rest non-printable characters are removed.

Fixes #20010

May 01 '24 14:05 HanabishiRecca

qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.

May 01 '24 16:05 glassez

If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.

Are you sure? There is no way to reliably do that in JS for WebUI. For regualr GUI it probably will require extra overhead with string copies. Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.

We already do sanitization in-place e.g. for peer id. Libtorrent also does similar things on its side.

May 01 '24 17:05 HanabishiRecca

Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.

I disagree with you in general. But considering that in this particular case, this field is unlikely to be intended for anything other than a visual representation of the client name, I agree to leave it as it is. (If we really need something more unique, then we should use the raw peer ID.)

May 02 '24 07:05 glassez

It uses QChar::isPrint() to replace all non-printable characters with spaces. I'm not sure if it accounts for all possible dangerous Unicode sequences, feel free to advice improvements.

Fixes #20010

Can you confirm that it actually fixes #20010?

May 02 '24 07:05 glassez

qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

https://www.libtorrent.org/reference-Core.html#peer_info states:

client A human readable string describing the software at the other end of the connection. In some cases this information is not available, then it will contain a string that may give away something about which software is running in the other end. In the case of a web seed, the server type and version will be a part of this string. This is UTF-8 encoded.

IMO libtorrent do try to make it human readable and if that is the case then shouldn't this PR applied at libtorrent side?

May 02 '24 10:05 Chocobo1

this field is unlikely to be intended for anything other than a visual representation of the client name

Exactly. It's an arbitrary string sent by a remote client. We should never ever rely or make assumptions about it.

Can you confirm that it actually fixes https://github.com/qbittorrent/qBittorrent/issues/20010?

Yes, I tested it artifically. It sanitizes \n characters.

shouldn't this PR applied at libtorrent side?

Can't say. Some other clients using libtorrent may want to see the original string. Who knows.

May 02 '24 10:05 HanabishiRecca

https://github.com/qbittorrent/qBittorrent/blob/05416458db79e050f8500e1d543148ce4cf2ecdb/src/gui/properties/peerlistwidget.cpp#L483

The GUI side has escaping for HTML entities. If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over. IIRC WebUI has escaping too so it would need to be removed to avoid double escaping.

May 02 '24 11:05 Chocobo1

If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code

We probably should do it for all fields then in a separate PR?

May 02 '24 11:05 HanabishiRecca

If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code

We are only talking about the client field in this PR. It comes back to this https://github.com/qbittorrent/qBittorrent/pull/20788#issuecomment-2088720667

We probably should do it for all fields then in a separate PR?

I would prefer changes to client field is finished in this PR. Can't say my opinion for other fields now.

May 02 '24 11:05 Chocobo1

Ok. Changes:

Source string is simplified().
Space-like characters (\t, \n, \v, \f, \r ...) are replaced with regular whitespaces.
Rest non-printable characters are omitted completely.
toHtmlEscaped() interned into the method.

May 02 '24 12:05 HanabishiRecca

The GUI side has escaping for HTML entities. If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

But this certainly shouldn't be in the base layer. The reasoning that it is not (yet) used without toHtmlEscaped() is invalid in terms of the correct application architecture. The need for HTML escaping is an attribute of some display methods, not the model. Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.

May 02 '24 15:05 glassez

Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.

I tend to agree with that.

May 02 '24 16:05 HanabishiRecca

@HanabishiRecca Thank you!

May 11 '24 15:05 glassez

qBittorrent qBittorrent copied to clipboard

Sanitize peer client names

qBittorrent
qBittorrent copied to clipboard