rosen icon indicating copy to clipboard operation
rosen copied to clipboard

Some comments

Open tutanator opened this issue 4 years ago • 10 comments
trafficstars

Need to post here because shadowban on /r/vpn. Original post on reddit:

Hi,

some comments:

Are you aware of TLS fingerprinting? Does your application protect against this ? See https://www.net.in.tum.de/fileadmin/TUM/NET/NET-2020-04-1/NET-2020-04-1_04.pdf for some small overview or take a look at https://github.com/p4gefau1t/trojan-go which is tunnel application that should not be (TLS) fingerprintable.

Relevant library is: https://github.com/refraction-networking/utls

From your website:

to hide the true destination of the traffic, but since domain fronting relied on an undocumented feature of major CDNs, it no longer works.

Domain fronting still does work for Azure and Cloudfront to some extend. The provider will know about it though and the question is if it if will still work in the future. It should probably be not relied upon too much.

This is accomplished by responding as a proxy if and only if a valid key is provided and falling back to some default behaviour otherwise.

What is default behavior? I guess, ideally it should look like one commonly used server e.g. nginx

There is a option to disable TLS 1.3 as it could be blocked by some nation-state firewalls.

afaik (as the website states) only with esni. TLS 1.3 alone is not a problem.

The server will automatically provision a TLS certificate from LetsEncrypt and the client pins LetsEncrypt’s root by default.

A self signed cert should also be an option.

The delta I measured between providing a 32 byte key and not providing a key is 29ns (on an AMD Ryzen 3700X). Since network requests have a latency in the milliseconds, I assume this attack is practically infeasible.

For a single measurement. But what about lots of measurements? Also, is providing a key itself visible from other patterns like traffic size?

A simpler attack is to look at the static files that the HTTPS server responds with. If the user does not replace the default files with their own, an easy distinguishing attack is possible.

That's difficult. The attacker in theory should not be able to enumerate all files/urls because they are encrypted. Even if he accesses the domain himself he just sees the main website but does not automatically know what other content is there. Providing your own files might make you more fingerprintable. Instead you could just keep the default ones of nginx.

The cover protocol uses the standard library implementation of HTTPS which should be widely used by many different applications in various contexts.

Still the handshake might look different than a common app like chrome.

To handle these attacks, the protocol could use some kind of random padding, limit the size and frequency of round trips, or replace the static decoy handler with a custom one that has different traffic characteristics.

I think, if you tunnel normal web browsing traffic through a tunnel going to some domain it should not look different to the same traffic without a tunnel. As long as you don't do special stuff which a normal browser - website connection does not do. If you tunnel all of your traffic through it it might be more suspicious. If some residential ip connects to some foreign IP and to nowhere else and uses a lot of traffic it might stick out. If you tunnel you will have some overhead. But as long as the adversary does not know the exact endpoints/urls he would not know if the client accessed some legitimate url or tunneled its way to the next hop which then fetches the real content.

Currently the client waits a random interval between 0 and 100ms before polling the server for data. This choice was made to minimise latency but it is not typical of an ordinary website.

Then why do this? You linked a paper for traffic analysis, but don't really tell what was the reason for that specific choice.

Maybe you already know about it: https://github.com/net4people/bbs/issues

tutanator avatar Dec 20 '20 17:12 tutanator

Hey, thanks for the questions.

Are you aware of TLS fingerprinting? Does your application protect against this ?

Yes, the implementation uses the system library defaults for TLS configuration, so it will likely be fingerprinted as a Go http server and client. You can verify this with nmap.

Relevant library is: https://github.com/refraction-networking/utls

I was not aware of this project, thanks.

What is default behavior? I guess, ideally it should look like one commonly used server e.g. nginx

Currently it serves a local folder of static assets. Feasibly the request handler could be swapped out for anything else.

A self signed cert should also be an option.

When raw TLS support is added, self-signed certificates will be supported. I did not see the value in adding them for a server that is pretending to be a normal HTTPS website. These do not commonly use self-signed certificates.

For a single measurement. But what about lots of measurements? Also, is providing a key itself visible from other patterns like traffic size?

This value is an average of millions of measurements. Since the amount of data that the client sends is not predictable, it should not be possible to verify that a small, 32 byte value was sent alongside it. However, I do plan on adding random padding to the data that should make it impossible.

That's difficult. The attacker in theory should not be able to enumerate all files/urls because they are encrypted. Even if he accesses the domain himself he just sees the main website but does not automatically know what other content is there. Providing your own files might make you more fingerprintable. Instead you could just keep the default ones of nginx.

You're right about there being some kind of resistance to this attack based on unknown files or services that cannot be enumerated. Perhaps the server can just respond with a 404 page or redirect users elsewhere in this case too. If I was to replace the files with Apache or nginx, I would have to make the entire server look like an Apache or nginx server, including in the TLS fingerprint. I don't know if this is worth it over the current Go fingerprint.

Still the handshake might look different than a common app like chrome.

It will look like a Go client. The behaviour will be unique and this is what I'm most concerned about. WebSocket support may mitigate this.

As long as you don't do special stuff which a normal browser - website connection does not do.

This is the problem. Normal browsers don't make very fast and frequent requests and they usually have requests that are way smaller than responses. WebSockets again would be a better way of blending in, perhaps.

If some residential ip connects to some foreign IP and to nowhere else and uses a lot of traffic it might stick out.

You are right, but there's no way around this without having a CDN or many unpredictable IPs.

Currently the client waits a random interval between 0 and 100ms before polling the server for data. This choice was made to minimise latency but it is not typical of an ordinary website.

Then why do this? You linked a paper for traffic analysis, but don't really tell what was the reason for that specific choice.

I mention in the same sentence why this choice was made: to minimise latency. I am still thinking about an alternative that will trade off latency for blending in more, that can be turned on if needed.

awnumar avatar Dec 20 '20 17:12 awnumar

Quick response :)

I did not see the value in adding them for a server that is pretending to be a normal HTTPS website. These do not commonly use self-signed certificates.

I did try to research something like that a while ago. I could not really find any statistics for this specific part. While most of the traffic is encrypted nowadays the question is how many servers using https are actually out there and what kind of certs they use. You probably could figures this out by some scan of the ipv4 range. I would *believe" someone providing a service on port 443 would use a cert signed by some official CA. However, just because I believe it, it doesn't automatically make it right.

This value is an average of millions of measurements. Since the amount of data that the client sends is not predictable, it should not be possible to verify that a small, 32 byte value was sent alongside it. However, I do plan on adding random padding to the data that should make it impossible.

Ah ok, I thought it was just for a single measurement. I don't know if padding itself could be fingerprinted. In theory TLS 1.3 supports padding but I am not aware if anything actually uses this or if some TLS libraries support this out of the box. I guess, unless being made mandatory, few would use it because it just creates bandwidth overhead. If only few use it and it is detectable itself you might again stick out of the crowd.

If I was to replace the files with Apache or nginx, I would have to make the entire server look like an Apache or nginx server, including in the TLS fingerprint. I don't know if this is worth it over the current Go fingerprint.

I think caddy server uses go and while usage is increasing, overall caddy is barely used compared to nginx or apache. I don't know how hard it is to fake the server fingerprint and if your fingerprint will look like the one from caddy. Maybe utls can be used for this too. GFW is definitely fingerprinting servers and kills the connection if it suspects someone is circumventing the firewall. You can bet other scanners on the internet are doing that too (maybe not for direct blocking yet, but at least for gathering intel) See here for the gfw detection https://gfw.report/talks/imc20/en/ I think in the last link from my first post you also find some more resources. The utls people also have a publication on a non fingerprintable server and trojan-go tries also to prevent fingerprinting.

It will look like a Go client. The behaviour will be unique and this is what I'm most concerned about. WebSocket support may mitigate this.

But how does a Go client look like? Is there only one standard handshake? Will the JA3/TLS fingerprint of a recent chrome browser look like the one from your client? Regarding websockets: What I have been wondered for a while, since the parrot is dead, why not use a browser itself as a proxy. I don't have a clue if this will work and obviously it would be quite big in size, resource usage ... . But would it be possible for other applications to make use of the websocket created by a browser? So you kind of tunnel traffic through the browser while having the same fingerprint as a browser.

This is the problem. Normal browsers don't make very fast and frequent requests and they usually have requests that are way smaller than responses.

But how is this relevant for normal web browsing through the tunnel? Traffic patterns/sizes/timings should look similar. You only get some overhead from the external TLS. No? If you do torrenting or other stuff it might look odd if this comes from a browser.

I mention in the same sentence why this choice was made: to minimise latency. I am still thinking about an alternative that will trade off latency for blending in more, that can be turned on if needed.

I think I have some kind of misunderstanding. Where does this latency come from? You delay traffic to the server, but normal browser don't delay? Why use additional latency at all?

tutanator avatar Dec 20 '20 18:12 tutanator

I did try to research something like that a while ago. I could not really find any statistics for this specific part.

Well, since a self-signed certificate would be rejected by all modern browsers, I think it's a safe assumption.

In theory TLS 1.3 supports padding

We don't need protocol support, the plaintext payload itself could be padded and unpadded at the application layer.

I think caddy server uses go and while usage is increasing, overall caddy is barely used compared to nginx or apache.

True, but Apache and nginx aren't the only webservers in existence. It's not feasible for a censor to go around blocking everything except those.

But how does a Go client look like? Is there only one standard handshake?

Like a Go client. It probably changes as the standard library is updated but I expect it to be fingerprintable as a Go stdlib client.

why not use a browser itself as a proxy

That's a possibility. There are the downsides you mentioned but this would only affect the actual fingerprint of the client and not the timing and behaviour.

But how is this relevant for normal web browsing through the tunnel? Traffic patterns/sizes/timings should look similar. You only get some overhead from the external TLS. No? If you do torrenting or other stuff it might look odd if this comes from a browser.

Ideally we want all kinds of traffic to be "safe".

I think I have some kind of misunderstanding. Where does this latency come from? You delay traffic to the server, but normal browser don't delay? Why use additional latency at all?

HTTP is a request-response protocol. The server cannot push data to the client without the client asking for it. Therefore the client has to make periodic pings to check if there is data, as well as continuously pushing data to the server. If the delay between pings is too high, there will be a higher average delay for a round trip to a remote server.

Thanks for the links, they are helpful.

awnumar avatar Dec 20 '20 18:12 awnumar

Well, since a self-signed certificate would be rejected by all modern browsers, I think it's a safe assumption.

Depends. If the service is intended for the average internet user, yes. But servers provide support for multiple domains or just access by ip+https. In a corporate environment and also others where you don't deal directly with the average user selfsigned certs and custom root CAs are not uncommon. Not everyone trusts PKI (for good reasons). If you have the option to disable official root CAs and your software/servers/whatever will still work (with your own CA) then you should do that. I'm not 100% sure on the background but:

nslookup google.com gives 216.58.197.206 in my case "openssl s_client -connect 216.58.197.206:443 -showcerts -servername google.com" gives a cert signed by GlobalSign and Google However, "openssl s_client -connect 216.58.197.206:443 -showcerts" just gives a self-signed cert. I don't know for sure but I am certain google owns these servers and it is not someone else cert. Apparently google owned stuff seems to make use of self signed certs and this is not the only ip/domain. Dunno why they do it though or I am overlooking something. I also don't know how common this is in general, you'd need to scan the internet yourself or ask Cisco and friends which probably already have this kind of data.

True, but Apache and nginx aren't the only webservers in existence. It's not feasible for a censor to go around blocking everything except those.

That's true but they have the biggest market share. According to some stats I found Caddy seems to be <1% of market share. By using it you would stand out of the crowd and a censor like China could maybe live with censoring that 1% or at least taking a closer look/tracking its users. I'm not sure how trojan-go does it. If they use go for the server hello (and it looks like some innocent server) then you maybe could just use the same code. But they also use nginx as kind of backend which maybe does the handshake and would probably use openssl as library.

Ideally we want all kinds of traffic to be "safe".

That would be great but I guess also hard to accomplish. You would need to mimic all kinds of protocols. Webbrowsing and in particular stuff like netflix, youtube ... are the biggest chunks of internet traffic. You can blend in best by looking like those.

Therefore the client has to make periodic pings to check if there is data, as well as continuously pushing data to the server. If the delay between pings is too high, there will be a higher average delay for a round trip to a remote server.

I don't get that. Rosen is a tunnel for arbitrary traffic. Correct? Similar to a proxy, VPN, Tor ... Most (I think) do not have such kind of a mechanism. You don't need to check the server for data. The rosen server is not the real endpoint you want to connect to, just an intermediate hop. With a VPN, the client connects to the server, but the server just forwards the request to the endpoint and then forwards fetched data back to the client. No reason for polling the server for data, it does not store it itself (usually) There might be some rekeying and maybe keep alive packets but nothing for specifically asking the rosen-server for new data.

tutanator avatar Dec 20 '20 20:12 tutanator

Depends. If the service is intended for the average internet user, yes.

Sure. This is what the server is trying to pretend to be.

stuff like netflix, youtube ... are the biggest chunks of internet traffic. You can blend in best by looking like those.

That's a good idea, though bandwidth in these applications is dominated by downloads.

I don't get that. Rosen is a tunnel for arbitrary traffic. Correct? Similar to a proxy, VPN, Tor ... Most (I think) do not have such kind of a mechanism. You don't need to check the server for data.

I'm not sure I understand you. The client and the server have to exchange data. When we're talking about HTTP, the server must wait for a client request in order to respond with data.

awnumar avatar Dec 21 '20 17:12 awnumar

I'm not sure I understand you. The client and the server have to exchange data. When we're talking about HTTP, the server must wait for a client request in order to respond with data.

Correct me if I am wrong, but currently I see rosen working like this:

local client (like browser) --> local rosen client --> TLS connection --> remote rosen server --> internet (websites etc)

You said the client needs to ping the server regularly in order to check for new data. Client and server in this case referred to rosen client and rosen server, correct? The rosen server just relays traffic it does not store it himself. Like a normal VPN server or proxy it forwards requests to the internet. When a request from the client arrives it will just forward it to the intended destination. The website then sends its content back through the rosen server to the client. Of course you need to make a request/connection to the rosen server if you want some data. But why would the rosen client regularly poll/ping the rosen server for new data? How would that look like? You ping the rosen server from the rosen client and the server then sends back what?

tutanator avatar Dec 22 '20 12:12 tutanator

local client (like browser) --> local rosen client --> TLS connection --> remote rosen server --> internet (websites etc)

The TLS connection is really a HTTP connection over TLS: HTTPS.

The client cannot know if the server has received incoming data from the Internet. It must perform a request to allow the server to respond with any waiting data.

You ping the rosen server from the rosen client and the server then sends back what?

Waiting data.

awnumar avatar Dec 23 '20 22:12 awnumar

It will look like a Go client. The behaviour will be unique and this is what I'm most concerned about. WebSocket support may mitigate this.

gRPC or h2 duplex streams can provide the same natural behavior.

also, I noticed that you are sending the payload as JSON. why not application/octet-stream? I assume that way you can treat the payload as raw bytes and you don't have to encode/decode anything anymore.

itshaadi avatar Jan 19 '21 15:01 itshaadi

I noticed that you are sending the payload as JSON. why not application/octet-stream? I assume that way you can treat the payload as raw bytes and you don't have to encode/decode anything anymore.

JSON provides a convenient way of serialising and deserialising objects. Without it we'd have to implement some custom encoder that may end up being faster, but JSON seems quite fast already from my testing.

awnumar avatar Jan 19 '21 21:01 awnumar

I noticed that you are sending the payload as JSON. why not application/octet-stream? I assume that way you can treat the payload as raw bytes and you don't have to encode/decode anything anymore.

Rosen now supports transmitting raw bytes in the tcp protocol that was just merged. This relies on the newly added tunnel module that also encrypts and authenticates traffic, so bits on the wire appear to be random. The tcp cover protocol has much better bandwidth and latency characteristics than https.

The goal is for all cover protocols to use a tunnel eventually.

You can find the release here: https://github.com/awnumar/rosen/releases/tag/v0.1.0

awnumar avatar Mar 09 '22 20:03 awnumar