qBittorrent
qBittorrent copied to clipboard
Sanitize peer client names
Clients can send arbitrary UTF-8 strings as their client identificators. qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.
This PR introduces basic sanitization for client strings.
- Source string is simplified.
- Rest non-printable characters are removed.
Fixes #20010
qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.
If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.
If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.
Are you sure? There is no way to reliably do that in JS for WebUI. For regualr GUI it probably will require extra overhead with string copies. Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.
We already do sanitization in-place e.g. for peer id. Libtorrent also does similar things on its side.
Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.
I disagree with you in general. But considering that in this particular case, this field is unlikely to be intended for anything other than a visual representation of the client name, I agree to leave it as it is. (If we really need something more unique, then we should use the raw peer ID.)
It uses
QChar::isPrint()to replace all non-printable characters with spaces. I'm not sure if it accounts for all possible dangerous Unicode sequences, feel free to advice improvements.Fixes #20010
Can you confirm that it actually fixes #20010?
qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.
https://www.libtorrent.org/reference-Core.html#peer_info states:
client A human readable string describing the software at the other end of the connection. In some cases this information is not available, then it will contain a string that may give away something about which software is running in the other end. In the case of a web seed, the server type and version will be a part of this string. This is UTF-8 encoded.
IMO libtorrent do try to make it human readable and if that is the case then shouldn't this PR applied at libtorrent side?
this field is unlikely to be intended for anything other than a visual representation of the client name
Exactly. It's an arbitrary string sent by a remote client. We should never ever rely or make assumptions about it.
Can you confirm that it actually fixes https://github.com/qbittorrent/qBittorrent/issues/20010?
Yes, I tested it artifically. It sanitizes \n characters.
shouldn't this PR applied at libtorrent side?
Can't say. Some other clients using libtorrent may want to see the original string. Who knows.
https://github.com/qbittorrent/qBittorrent/blob/05416458db79e050f8500e1d543148ce4cf2ecdb/src/gui/properties/peerlistwidget.cpp#L483
The GUI side has escaping for HTML entities. If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over. IIRC WebUI has escaping too so it would need to be removed to avoid double escaping.
If sanitizing are to be done at
PeerInfo::client()then you should move it (toHtmlEscaped()) over.
It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code
We probably should do it for all fields then in a separate PR?
If sanitizing are to be done at
PeerInfo::client()then you should move it (toHtmlEscaped()) over.It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code
We are only talking about the client field in this PR. It comes back to this https://github.com/qbittorrent/qBittorrent/pull/20788#issuecomment-2088720667
We probably should do it for all fields then in a separate PR?
I would prefer changes to client field is finished in this PR. Can't say my opinion for other fields now.
Ok. Changes:
- Source string is
simplified(). - Space-like characters (
\t,\n,\v,\f,\r...) are replaced with regular whitespaces. - Rest non-printable characters are omitted completely.
toHtmlEscaped()interned into the method.
The GUI side has escaping for HTML entities. If sanitizing are to be done at
PeerInfo::client()then you should move it (toHtmlEscaped()) over.
But this certainly shouldn't be in the base layer. The reasoning that it is not (yet) used without toHtmlEscaped() is invalid in terms of the correct application architecture. The need for HTML escaping is an attribute of some display methods, not the model. Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.
Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.
I tend to agree with that.
@HanabishiRecca Thank you!