torrust-tracker icon indicating copy to clipboard operation
torrust-tracker copied to clipboard

Add tests for `protocol::utils` module

Open josecelano opened this issue 3 years ago • 21 comments

  • [ ] common
  • [ ] utils

josecelano avatar Aug 10 '22 13:08 josecelano

pub struct TorrentPeer {
    pub peer_id: PeerId,
    pub peer_addr: SocketAddr,
    #[serde(serialize_with = "ser_instant")]
    pub updated: std::time::Instant,
    #[serde(with = "NumberOfBytesDef")]
    pub uploaded: NumberOfBytes,
    #[serde(with = "NumberOfBytesDef")]
    pub downloaded: NumberOfBytes,
    #[serde(with = "NumberOfBytesDef")]
    pub left: NumberOfBytes,
    #[serde(with = "AnnounceEventDef")]
    pub event: AnnounceEvent,
}

@WarmBeer , would not it be nice to rename the updated field like this:

#[serde(serialize_with = "ser_instant")]
#[serde(rename(deserialize = "updated_milliseconds_ago", serialize = "updated_milliseconds_ago"))]
pub updated: std::time::Instant,

To get a json like this:

{
  "info_hash": "4beb7001cb833968582c67f55cc59dcc6c8d3fe5",
  "seeders": 1,
  "completed": 0,
  "leechers": 0,
  "peers": [
    {
      "peer_id": {
        "id": "2d7142343431302d7358376d33786d2877674179",
        "client": "qBittorrent"
      },
      "peer_addr": "192.168.1.88:17548",
      "updated_milliseconds_ago": 385,
      "uploaded": 0,
      "downloaded": 0,
      "left": 0,
      "event": "None"
    }
  ]
}

I generally like to add the units in the attribute name.

We can use any other name like: not_updated_duration_in_msecs, not_updated_duration_in_msecs, ...

josecelano avatar Aug 10 '22 14:08 josecelano

I think renaming it to "updated_milliseconds_ago" is fine. Just “updated” is indeed too vague.

mickvandijke avatar Aug 10 '22 14:08 mickvandijke

I think renaming it to "updated_milliseconds_ago" is fine. Just “updated” is indeed too vague.

I'm not going to do it in this PR. Especially since it's a breaking change. Unless I find a way to add an extra field in the json without removing the "update" one.

josecelano avatar Aug 10 '22 15:08 josecelano

hi @WarmBeer, I do not understand the requirements for the UDP tracker connection ID.

pub fn get_connection_id(remote_address: &SocketAddr) -> ConnectionId {
    match std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH) {
        Ok(duration) => ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64),
        Err(_) => ConnectionId(0x7FFFFFFFFFFFFFFF),
    }
}

I have created three temporarily tests, but they may be wrong.

  1. connection_id_in_udp_tracker_should_be_the_same_for_one_client_during_some_hours
  2. connection_id_in_udp_tracker_should_be_different_for_each_tracker_client
  3. connection_id_in_udp_tracker_should_be_different_for_the_same_tracker_client_at_different_times

Are those behaviours right?

1 is passing, but I do not know if that is true. If that is true, how many hours should the connection id last? 2 is failing. Generating two connection ids from different addresses gives the same connection id. 3 is failing. Generating two connection ids from the same address after a while gives the same connection id. Should not the connection id change after a while, even for the same client?

On the other hand, should not the connection id be "guessable by the client"? From https://www.bittorrent.org/beps/bep_0015.html. With that implementation, I guess the client knows it because it knows its address and the current time in Unix Epoch.

Finally, as you can see in the test implementations, "mocking time" in tests leads to fragile, slow, or not feasible tests. One solution is controlling the clock in tests. For example https://users.rust-lang.org/t/proper-way-to-mock-in-rust/60621/5?u=josecelano

For dynamic languages, I could override the SystemTime::now() function for tests. I've seen this answer to do something similar https://stackoverflow.com/a/59614532/3012842 in Rust, but it implies changing your production code. I suppose the best option is to create the trait and use a different implementation for testing. In the end, I have to change the prod code too :-) but injecting the dependency instead of writing the "switch" directly.

In general, I think it is worth it to be able to mock the clock; other tests may depend on the time inderectly. For example, the torrent endpoint returns the time elapsed since the client was updated.

josecelano avatar Aug 10 '22 16:08 josecelano

pub fn get_connection_id(remote_address: &SocketAddr) -> ConnectionId {
    match std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH) {
        Ok(duration) => ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64),
        Err(_) => ConnectionId(0x7FFFFFFFFFFFFFFF),
    }
}

This is a very poor implementation of generating a Connection ID. I see why it fails your tests :'). get_connection_id needs a complete refactor to fulfil the requirements mentioned in https://www.bittorrent.org/beps/bep_0015.html

mickvandijke avatar Aug 10 '22 16:08 mickvandijke

pub fn get_connection_id(remote_address: &SocketAddr) -> ConnectionId {
    match std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH) {
        Ok(duration) => ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64),
        Err(_) => ConnectionId(0x7FFFFFFFFFFFFFFF),
    }
}

This is a very poor implementation of generating a Connection ID. I see why it fails your tests :'). get_connection_id needs a complete refactor to fulfil the requirements mentioned in https://www.bittorrent.org/beps/bep_0015.html

OK, then I can:

  • Re-implement it. I can take a look at other implementations. I do not think changing this implementation will produce many changes in the code.
  • Change the current_time function to use an external clock. "System clock" for production.

josecelano avatar Aug 10 '22 17:08 josecelano

pub fn get_connection_id(remote_address: &SocketAddr) -> ConnectionId {
    match std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH) {
        Ok(duration) => ConnectionId(((duration.as_secs() / 3600) | ((remote_address.port() as u64) << 36)) as i64),
        Err(_) => ConnectionId(0x7FFFFFFFFFFFFFFF),
    }
}

This is a very poor implementation of generating a Connection ID. I see why it fails your tests :'). get_connection_id needs a complete refactor to fulfil the requirements mentioned in https://www.bittorrent.org/beps/bep_0015.html

I did not know there was another implementation of the UDP tracker, and that function was copied from there:

https://github.com/naim94a/udpt/blob/master/src/server.rs#L354-L359

Other implementations generate a random id, and they store it in the DB:

https://github.com/zaninime/btracker/blob/4c48d8a381befbfce7fea000249630eb6d97e936/cmd/tracker.go#L52

I think I'm going to keep this change out of the scope of this PR and create a new issue, so we can discuss what it's the best implementation for the connection ID.

josecelano avatar Aug 11 '22 07:08 josecelano

It seems the current implementation for the get_connection_id could be valid after all.

Long explanation: https://github.com/torrust/torrust-tracker/issues/62#issuecomment-1212077204

If that is the case I will continue adding tests to it and an explanation about why it works.

josecelano avatar Aug 11 '22 14:08 josecelano

@WarmBeer I think we could move the function get_connection_id from the src/protocol/utils.rs to the src/udp/handlers.rs module because the connection ID is only used by the UDP tracker, isn't it?

josecelano avatar Aug 11 '22 14:08 josecelano

hi @WarmBeer @da2ce7, I have fixed the tests and created new ones for the connection id generation:

PASS [   0.018s] torrust-tracker protocol::utils::tests::connection_id_in_udp_tracker_should_be_different_for_each_client_at_the_same_time_if_they_use_a_different_port
PASS [   0.012s] torrust-tracker protocol::utils::tests::connection_id_in_udp_tracker_should_be_the_same_for_one_client_during_one_hour
PASS [   0.016s] torrust-tracker protocol::utils::tests::connection_id_in_udp_tracker_should_change_for_the_same_client_and_port_after_one_hour
PASS [   0.016s] torrust-tracker protocol::utils::tests::connection_id_in_udp_tracker_should_expire_after_one_hour
PASS [   0.008s] torrust-tracker protocol::utils::tests::connection_id_is_generated_based_on_remote_client_port_an_hours_passed_since_unix_epoch

I assume the function is OK. We should find out why it sues one to expire instead of the two minutes mentioned in the protocol specification.

I've create a "clock" module, but for this test I was not necessary to mock the system clock. I've only mocked the current time (the "now").

Now you need to pass the "now":

pub fn get_connection_id(remote_address: &SocketAddr, current_timestamp: u64) -> ConnectionId {
    ConnectionId(((current_timestamp / 3600) | ((remote_address.port() as u64) << 36)) as i64)
}

And this is the way to can use it:

pub async fn handle_connect(remote_addr: SocketAddr, request: &ConnectRequest, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {
    let connection_id = get_connection_id(&remote_addr, current_timestamp_from_system_clock());
...

I have slightly changed the behavior of the get_connection_id function. It was returning a hardcoded connection id when the system clock throws an error. Now I'm using "unwrap" and the method panics if it can't get the current time. @WarmBeer do you know why we can't not relay on the clock being there always. I assume that if we can't even get the time from the system something really bad has happened. :-). I can't revert that original behavior if you want.

josecelano avatar Aug 11 '22 18:08 josecelano

Rebased.

josecelano avatar Aug 12 '22 09:08 josecelano

hi @WarmBeer, I've been thinking again about your solution, and I do not know if adding a hash increases security. In the previous implementations, you have 10000 possible connections ids for a given two-minute window. In the new implementation for one IP, you still have the same amount of possible connections ids. If you are an attacker, you can call the tracker and generate those 10000 IDs. Becuase the SALT does not depend on the client. It's the same for all clients. This is like a challenge-response authentication where we use a finite number of responses for the challenge. Maybe that's why the protocol says the connection ID must last only 2 minutes. You can make 10000 requests in less than 2 minutes with all the possibilities, and one of them will be accepted as a legitimate request.

On the other hand, adding the IP to the hash does not improve security because it is supposed that one attacker will try to impersonate one IP. So it will try to generate the ID only for that IP.

Am I missing something?

cc @da2ce7

josecelano avatar Aug 12 '22 12:08 josecelano

hi @WarmBeer, I've been thinking again about your solution, and I do not know if adding a hash increases security. In the previous implementations, you have 10000 possible connections ids for a given two-minute window. In the new implementation for one IP, you still have the same amount of possible connections ids. If you are an attacker, you can call the tracker and generate those 10000 IDs. Becuase the SALT does not depend on the client. It's the same for all clients. This is like a challenge-response authentication where we use a finite number of responses for the challenge. Maybe that's why the protocol says the connection ID must last only 2 minutes. You can make 10000 requests in less than 2 minutes with all the possibilities, and one of them will be accepted as a legitimate request.

On the other hand, adding the IP to the hash does not improve security because it is supposed that one attacker will try to impersonate one IP. So it will try to generate the ID only for that IP.

Am I missing something?

cc @da2ce7

Hey @josecelano ,

In the old implementation you could just send a connect request with your own IP. Then the tracker would send a response with the Connection ID to your IP. This Connection ID would then be valid for ANY IP address. So you could then just send an announce request directly with a spoofed IP address and the Connection ID that the tracker sent to your actual IP address.

In the new implementation, the Connection ID is only valid for the IP address it was made for. Even if someone sends a connect request using a spoofed IP address, the tracker will only send the corresponding Connection ID to the actual owner of the IP address (if they happen to have a bittorrent client running on the same port).

To guess the Connection ID for a certain IP address + Port at a certain time would be impossible without knowing the Salt.

Does this make sense?

mickvandijke avatar Aug 12 '22 14:08 mickvandijke

@josecelano @WarmBeer

I think that your approach is good.

We need to give each client a unforgivable and nonce that:

  • The user can show to stop impersonation attacks.
  • Stateless, so the server doesn't need to remember it.
  • It needs to expire every two minutes.

The attacker will need to try connecting 64-bit of possibilities, Assuming the Authentication_String is unique to each user.

Static_Sever_Secret = Random (32-bytes), generated on sever start.
Time_Bound_Pepper = Hash(Static_Secret || Unix_Time_Minutes / 2) (32-bytes), cached, expires every two minutes.
Authentication_String = IP_Address || Port || User Token || Etc. (32-bytes), unique for each client.

ConnectionID = Hash(Time_Bound_Pepper || Authentication_String) (64-bit)

To Verify:

From ConnectionID:
1. Verify that ConnectionID == Hash(Time_Bound_Pepper || Authentication_String).

da2ce7 avatar Aug 12 '22 18:08 da2ce7

hi @WarmBeer, I've been thinking again about your solution, and I do not know if adding a hash increases security. In the previous implementations, you have 10000 possible connections ids for a given two-minute window. In the new implementation for one IP, you still have the same amount of possible connections ids. If you are an attacker, you can call the tracker and generate those 10000 IDs. Becuase the SALT does not depend on the client. It's the same for all clients. This is like a challenge-response authentication where we use a finite number of responses for the challenge. Maybe that's why the protocol says the connection ID must last only 2 minutes. You can make 10000 requests in less than 2 minutes with all the possibilities, and one of them will be accepted as a legitimate request. On the other hand, adding the IP to the hash does not improve security because it is supposed that one attacker will try to impersonate one IP. So it will try to generate the ID only for that IP. Am I missing something? cc @da2ce7

Hey @josecelano ,

In the old implementation you could just send a connect request with your own IP. Then the tracker would send a response with the Connection ID to your IP. This Connection ID would then be valid for ANY IP address. So you could then just send an announce request directly with a spoofed IP address and the Connection ID that the tracker sent to your actual IP address.

Yes, but if we add the IP to that implementation instead of using only the port we could solve that problem.

The connection ID would be valid only for a given IP. My mistake was I thought you could call the server 10K times with a spoofed IP to get the list of connections IDS for all ports for a given IP. But I always forget the response goes to the real IP, so you can not build such a "table of connections ID for an IP for all ports".

In the new implementation, the Connection ID is only valid for the IP address it was made for. Even if someone sends a connect request using a spoofed IP address, the tracker will only send the corresponding Connection ID to the actual owner of the IP address (if they happen to have a bittorrent client running on the same port).

I think the new implementation does two things:

  1. It binds the connection ID to the IP, which makes it harder to build the list of connection IDs. You have to own the IP to make the request.
  2. It intoruduces a SALT (or maybe a more correct term: SECRET SALT, right @da2ce7), which makes the connection ID different not only for each IP/PORT but also for each new starting of the application (it changes every time you start the server).

To guess the Connection ID for a certain IP address + Port at a certain time would be impossible without knowing the Salt.

Yes, that's true. But I was thinking of ways to build the connection ID simulating being another client and making requests to the server to build the connection ID.

Does this make sense?

Thank you for your clarification.

The final solution is much safer, but I wonder if other implementations do that. In the end, for the first implementation, if you add the IP, it seems to be reasonably safe. You need to know the port used by the client you are trying to impersonate.

The C++ implementation I found does that (it uses the IP):

https://github.com/torrust/torrust-tracker/issues/62#issuecomment-1211740519

Anyway, I would keep our implementation and add our discussion to the documentation. We can mention that there is a simpler solution without using hashes.

@WarmBeer we could rename variables to use names proposed by @da2ce7:

  • Static_Sever_Secret
  • Time_Bound_Pepper
  • Authentication_String
  • Etcetera

I think that would give a clue about what the code does to other developers with cryptographic knowledge.

josecelano avatar Aug 13 '22 08:08 josecelano

@josecelano

Yes, but if we add the IP to that implementation instead of using only the port we could solve that problem.

The formula would then be:

connection_id = ((time_in_seconds / 120) | ip | port) << 36

The problem with this is that all these variables (time_in_seconds, ip and port) are known to anyone. Clients could just pre-calculate the Connection ID themselves and skip the connection request to go right to the announce request.

To fix this we have to include a SECRET_STRING in there. Then we would have something like:

connection_id = ((time_in_seconds / 120) | ip | port | SECRET_STRING) << 36

I think this would be decent for generating the Connection ID. I'm only afraid that with this formula it would be too easy maybe for someone to crack the SECRET_STRING. To crack the SECRET_STRING, someone could compare the bits of server generated Connection ID's with locally generated Connection ID's using the same input (but without adding the bitwise OR for a SECRET_STRING).

This is why I instead chose for hashing a concatenation of the variables.

mickvandijke avatar Aug 13 '22 10:08 mickvandijke

@josecelano @WarmBeer

I think that your approach is good.

We need to give each client a unforgivable and nonce that:

  • The user can show to stop impersonation attacks.
  • Stateless, so the server doesn't need to remember it.
  • It needs to expire every two minutes.

The attacker will need to try connecting 64-bit of possibilities, Assuming the Authentication_String is unique to each user.

Static_Sever_Secret = Random (32-bytes), generated on sever start.
Time_Bound_Pepper = Hash(Static_Secret || Unix_Time_Minutes / 2) (32-bytes), cached, expires every two minutes.
Authentication_String = IP_Address || Port || User Token || Etc. (32-bytes), unique for each client.

ConnectionID = Hash(Time_Bound_Pepper || Authentication_String) (64-bit)

To Verify:

From ConnectionID:
1. Verify that ConnectionID == Hash(Time_Bound_Pepper || Authentication_String).

Hey @da2ce7 , thank you for your input.

I was wondering however, what do we gain from doing this:

Static_Sever_Secret = Random (32-bytes), generated on sever start.
Time_Bound_Pepper = Hash(Static_Secret || Unix_Time_Minutes / 2) (32-bytes), cached, expires every two minutes.
Authentication_String = IP_Address || Port || User Token || Etc. (32-bytes), unique for each client.

ConnectionID = Hash(Time_Bound_Pepper || Authentication_String) (64-bit)

Over this:

Static_Sever_Secret = Random (32-bytes), generated on sever start.
Authentication_String = IP_Address || Port || User Token || Etc. (32-bytes), unique for each client.

ConnectionID = Hash(Static_Secret || Unix_Time_Minutes / 2 || Authentication_String) (64-bit)

To me it seems like an unnecessary extra step to generate a Time_Bound_Pepper hash instead of adding the Time_Bound_Pepper inputs directly to the final hash. Am I missing something?

mickvandijke avatar Aug 13 '22 11:08 mickvandijke

@josecelano

Yes, but if we add the IP to that implementation instead of using only the port we could solve that problem.

The formula would then be:

connection_id = ((time_in_seconds / 120) | ip | port) << 36

The problem with this is that all these variables (time_in_seconds, ip and port) are known to anyone. Clients could just pre-calculate the Connection ID themselves and skip the connection request to go right to the announce request.

OK, I tend to forget that you know the other client's IPs and ports. If you did not know them, you could only authenticate but not impersonate. For me, impersonation implies you know who you want to impersonate. If you did not know the IP and port, you could only try to impersonate random IPs/ports. That's why I told it could be valid.

By the way. I think we should give an example of why those things are bad. I have not thought about it, but I suppose it only matters to avoid DDoS attacks, not because you can get sensitive/private data from one client. I mean, the only thing you can do is announce a torrent, isn't it?

For example, bad behaviour could be:

  • You connect to a swarm as a new peer.
  • You announce a torrent and get a list of other peers with their ports having that torrent (in the announce response).
  • You announce again using someone else's IP and port.

You could say you are "completed". Other clients could choose this peer because they are sure it has all parts. Does it make any sense? I can't imagine a worse scenario.

To fix this we have to include a SECRET_STRING in there. Then we would have something like:

connection_id = ((time_in_seconds / 120) | ip | port | SECRET_STRING) << 36

I think this would be decent for generating the Connection ID. I'm only afraid that with this formula it would be too easy maybe for someone to crack the SECRET_STRING.

This is why I instead chose for hashing a concatenation of the variables.

josecelano avatar Aug 13 '22 11:08 josecelano

@josecelano

Yes, but if we add the IP to that implementation instead of using only the port we could solve that problem.

The formula would then be:

connection_id = ((time_in_seconds / 120) | ip | port) << 36

The problem with this is that all these variables (time_in_seconds, ip and port) are known to anyone. Clients could just pre-calculate the Connection ID themselves and skip the connection request to go right to the announce request.

OK, I tend to forget that you know the other client's IPs and ports. If you did not know them, you could only authenticate but not impersonate. For me, impersonation implies you know who you want to impersonate. If you did not know the IP and port, you could only try to impersonate random IPs/ports. That's why I told it could be valid.

By the way. I think we should give an example of why those things are bad. I have not thought about it, but I suppose it only matters to avoid DDoS attacks, not because you can get sensitive/private data from one client. I mean, the only thing you can do is announce a torrent, isn't it?

For example, bad behaviour could be:

  • You connect to a swarm as a new peer.
  • You announce a torrent and get a list of other peers with their ports having that torrent (in the announce response).
  • You announce again using someone else's IP and port.

You could say you are "completed". Other clients could choose this peer because they are sure it has all parts. Does it make any sense? I can't imagine a worse scenario.

To fix this we have to include a SECRET_STRING in there. Then we would have something like:

connection_id = ((time_in_seconds / 120) | ip | port | SECRET_STRING) << 36

I think this would be decent for generating the Connection ID. I'm only afraid that with this formula it would be too easy maybe for someone to crack the SECRET_STRING. This is why I instead chose for hashing a concatenation of the variables.

With the Connection ID we check for two things:

  1. The announcing client owns the ip and port it is announcing with.
  2. The announcing client is an online BitTorrent peer.

If the Connection ID is guessable by clients, the worst thing that can happen is that a torrent's peer list will be filled with fake peers that can be set to any ip address and port.

mickvandijke avatar Aug 13 '22 11:08 mickvandijke

Ok, I understand now. It's more important to keep a list of real peers rather than avoiding having wrong info for a peer.

Thank you @WarmBeer, I think you should write a book about BitTorrent :-).

On Sat, 13 Aug 2022, 12:44 WarmBeer, @.***> wrote:

@josecelano https://github.com/josecelano

Yes, but if we add the IP to that implementation instead of using only the port we could solve that problem.

The formula would then be:

connection_id = ((time_in_seconds / 120) | ip | port) << 36

The problem with this is that all these variables (time_in_seconds, ip and port) are known to anyone. Clients could just pre-calculate the Connection ID themselves and skip the connection request to go right to the announce request.

OK, I tend to forget that you know the other client's IPs and ports. If you did not know them, you could only authenticate but not impersonate. For me, impersonation implies you know who you want to impersonate. If you did not know the IP and port, you could only try to impersonate random IPs/ports. That's why I told it could be valid.

By the way. I think we should give an example of why those things are bad. I have not thought about it, but I suppose it only matters to avoid DDoS attacks, not because you can get sensitive/private data from one client. I mean, the only thing you can do is announce a torrent, isn't it?

For example, bad behaviour could be:

  • You connect to a swarm as a new peer.
  • You announce a torrent and get a list of other peers with their ports having that torrent (in the announce response).
  • You announce again using someone else's IP and port.

You could say you are "completed". Other clients could choose this peer because they are sure it has all parts. Does it make any sense? I can't imagine a worse scenario.

To fix this we have to include a SECRET_STRING in there. Then we would have something like:

connection_id = ((time_in_seconds / 120) | ip | port | SECRET_STRING) << 36

I think this would be decent for generating the Connection ID. I'm only afraid that with this formula it would be too easy maybe for someone to crack the SECRET_STRING. This is why I instead chose for hashing a concatenation of the variables.

With the Connection ID we check for two things:

  1. The announcing client owns the ip and port it is announcing with.
  2. The announcing client is an online BitTorrent peer.

If the Connection ID is guessable by clients, the worst thing that can happen is that a torrent's peer list will be filled with fake peers that can be set to any ip address and port.

— Reply to this email directly, view it on GitHub https://github.com/torrust/torrust-tracker/pull/60#issuecomment-1214143706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOLQHRVE7EQ6ETXJ72GKLVY6DDBANCNFSM56ETM5EA . You are receiving this because you were mentioned.Message ID: @.***>

josecelano avatar Aug 13 '22 11:08 josecelano

@WarmBeer

Over this [having the extra step of producing the Time_Bound_Pepper]

Of course you are right. It is just the same. I separated the parts so I could use more descriptive names.

Hashes are very cheap there isn't much need to worry about having a couple more of them at this stage.

You may wish to do something like this: (rust pseudo code, I don't know the language yet)

on_startup:
static SERVER_SECRET: [u8; 32] =  random();

#[derive(Default)]
struct TimeBoundPepper {
   created: SystemTime,
   pepper: [u8; 32],
}

impl Default for TimeBoundPepper {
    fn default() -> Self { SystemTime::now(), pepper: hash(SERVER_SECRET || self.created ) };
}

impl TimeBoundPepper {
  pub fn new() ->  Default::default();

  pub fn get_pepper(&mut self, expires: std::time::Duration) -> [u8; 32] {
    if (created.elapsed().unwrap() >= expires) self = self.new();
    return self.pepper;
}

da2ce7 avatar Aug 13 '22 14:08 da2ce7

@josecelano is it ok to merge?

mickvandijke avatar Aug 16 '22 14:08 mickvandijke

@josecelano is it ok to merge?

No, not yet. I've moved the pending tasks from the comment to the PR description.

josecelano avatar Aug 16 '22 14:08 josecelano

@WarmBeer

Over this [having the extra step of producing the Time_Bound_Pepper]

Of course you are right. It is just the same. I separated the parts so I could use more descriptive names.

Hashes are very cheap there isn't much need to worry about having a couple more of them at this stage.

You may wish to do something like this: (rust pseudo code, I don't know the language yet)

on_startup:
static SERVER_SECRET: [u8; 32] =  random();

#[derive(Default)]
struct TimeBoundPepper {
   created: SystemTime,
   pepper: [u8; 32],
}

impl Default for TimeBoundPepper {
    fn default() -> Self { SystemTime::now(), pepper: hash(SERVER_SECRET || self.created ) };
}

impl TimeBoundPepper {
  pub fn new() ->  Default::default();

  pub fn get_pepper(&mut self, expires: std::time::Duration) -> [u8; 32] {
    if (created.elapsed().unwrap() >= expires) self = self.new();
    return self.pepper;
}

hi @da2ce7 , regarding this implementation, I wanted to be sure you want to use 32 bytes instead of 32 bits. It's strange you want Time_Bound_Pepper and Authentication_String to be 32 bytes, but the final OR operation result is 64 bits. I suppose we have to truncate the final hash as it is now.

@WarmBeer there is no OR operator for arrays like [u8;32]. I suppose I can implement it with a loop.

josecelano avatar Aug 16 '22 17:08 josecelano

hi @WarmBeer I have to generate the SERVER_SECRET when the UDP tracker starts. Should all jobs use the same secret? If yes, I suppose it would be something like this:

    pub async fn start(&self) {
        loop {
            let mut data = [0; MAX_PACKET_SIZE];
            let socket = self.socket.clone();
            let tracker = self.tracker.clone();

            // TODO: random stuff
            let server_secret = ByteArray32::new([0;32]); 

            tokio::select! {
                _ = tokio::signal::ctrl_c() => {
                    info!("Stopping UDP server: {}..", socket.local_addr().unwrap());
                    break;
                }
                Ok((valid_bytes, remote_addr)) = socket.recv_from(&mut data) => {
                    let payload = data[..valid_bytes].to_vec();

                    debug!("Received {} bytes from {}", payload.len(), remote_addr);
                    debug!("{:?}", payload);

                    let response = handle_packet(server_secret, remote_addr, payload, tracker).await;
                    UdpServer::send_response(socket, remote_addr, response).await;
                }
            }
        }
    }

...

pub async fn handle_packet(server_secret: &ByteArray32, remote_addr: SocketAddr, payload: Vec<u8>, tracker: Arc<TorrentTracker>) -> Response {
    match Request::from_bytes(&payload[..payload.len()], MAX_SCRAPE_TORRENTS).map_err(|_| ServerError::InternalServerError) {
        Ok(request) => {
            let transaction_id = match &request {
                Request::Connect(connect_request) => {
                    connect_request.transaction_id
                }
                Request::Announce(announce_request) => {
                    announce_request.transaction_id
                }
                Request::Scrape(scrape_request) => {
                    scrape_request.transaction_id
                }
            };

            match handle_request(request, server_secret, remote_addr, tracker).await {
                Ok(response) => response,
                Err(e) => handle_error(e, transaction_id)
            }
        }
        ...
    }
}

...

pub async fn handle_request(request: Request, server_secret: &ByteArray32, remote_addr: SocketAddr, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {
    match request {
        Request::Connect(connect_request) => {
            handle_connect(server_secret, remote_addr, &connect_request, tracker).await
        }
        Request::Announce(announce_request) => {
            handle_announce(remote_addr, &announce_request, tracker).await
        }
        Request::Scrape(scrape_request) => {
            handle_scrape(remote_addr, &scrape_request, tracker).await
        }
    }
}

...

pub async fn handle_connect(server_secret: &ByteArray32, remote_addr: SocketAddr, request: &ConnectRequest, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {
    let current_timestamp = current_timestamp();
    let connection_id = get_connection_id(&server_secret, &remote_addr, current_timestamp);

    debug!("connection_id: {:?}, current timestamp: {:?}", connection_id, current_timestamp);

    let response = Response::from(ConnectResponse {
        transaction_id: request.transaction_id,
        connection_id,
    });

    // send stats event
    match remote_addr {
        SocketAddr::V4(_) => { tracker.send_stats_event(TrackerStatisticsEvent::Udp4Connect).await; }
        SocketAddr::V6(_) => { tracker.send_stats_event(TrackerStatisticsEvent::Udp6Connect).await; }
    }

    Ok(response)
}

Basically passing down the secret.

@da2ce7 the final ID generation implementation is like this:

time_bound_pepper = Hash(Server_Secret || Unix_Time_Minutes / 2)       (32 bytes, 256 bits)
hash_input = Concat(time_bound_pepper, authentication_string)          (64 bytes, 512 bits)
connection_id = Truncate(Hash(hash_input))                             ( 8 bytes,  64-bits)

In your proposal:

ConnectionID = Hash(Time_Bound_Pepper || Authentication_String) (64-bit)

Time_Bound_Pepper and Authentication_String are 32 bytes. I contact them generating a 64 bytes input for the hash function. The result is truncated to 64 bits.

I do not know if your idea was to truncate before hashing:

ConnectionID = Hash(Time_Bound_Pepper_32_bits || Authentication_String_32_bits) (64-bit)

I suppose it does not matter as long as we truncate, removing the first bytes (bytes filled with zeros to generate the 32 bytes array).

josecelano avatar Aug 17 '22 16:08 josecelano

@josecelano and @WarmBeer

I have made an working test implementation of the TimeBoundPepper structure:

use arc_swap::ArcSwap;
use std::fmt::Write;
use std::sync::Arc;
use std::thread::sleep;
use std::time::Duration;
use std::time::SystemTime;

#[derive(Copy, Clone)]
struct TimeBoundPepper {
    created: SystemTime,
    pepper: [u8; 32],
}

impl TimeBoundPepper {
    pub fn is_expired(&self, expires: Duration) -> bool {
        let time_now = self.created.elapsed().unwrap();
        println!(
            "Comparing Time: {0}, {1}",
            time_now.as_millis(),
            expires.as_millis()
        );

        return self.created.elapsed().unwrap() >= expires;
    }

    fn pepper(&self) -> &[u8; 32] {
        &self.pepper
    }
}

impl Default for TimeBoundPepper {
    fn default() -> Self {
        TimeBoundPepper {
            created: SystemTime::now(),
            pepper: rand::random(),
        }
    }
}

#[macro_use]
extern crate lazy_static;

lazy_static! {
    static ref TIME_BOUND_PEPPER: ArcSwap<TimeBoundPepper> = ArcSwap::default();
}

fn get_pepper(max_age: Duration) -> [u8; 32] {
    if TIME_BOUND_PEPPER.load().is_expired(max_age) {
        println!("Time Has Expired!");
        TIME_BOUND_PEPPER.swap(Arc::new(TimeBoundPepper::default()));
    }

    return TIME_BOUND_PEPPER.load().pepper().clone();
}

fn u8_32_to_hex_string(array: [u8; 32]) -> String {
    let mut str = String::with_capacity(2 * 32);

    for byte in array {
        let result = write!(&mut str, "{:02X}", byte);
        
        match result {
            Err(e) => println!("error parsing bytes: {e:?}"),
            Ok(_) => continue,
        }
    }
    return str;
}


fn main() {
    let max_age: Duration = Duration::new(2, 0);

    let pepper: [u8; 32] = get_pepper(max_age);

    //sleep(max_age);

    let pepper2: [u8; 32] = get_pepper(max_age);

    sleep(max_age);

    let pepper3: [u8; 32] = get_pepper(max_age);

    println!(
        "The entry for `pepper ` is \"{}\".",
        u8_32_to_hex_string(pepper)
    );
    println!(
        "The entry for `pepper2` is \"{}\".",
        u8_32_to_hex_string(pepper2)
    );
    println!(
        "The entry for `pepper3` is \"{}\".",
        u8_32_to_hex_string(pepper3)
    );
}

da2ce7 avatar Aug 17 '22 20:08 da2ce7

@josecelano and @WarmBeer

The previous implementation had the error that sometimes peers will be given peppers with very short lifetimes. I have updated the implementation so that there is a current and last pepper. To verify you need to check both.

use arc_swap::ArcSwap;
use std::fmt::Write;
use std::sync::Arc;
use std::thread::sleep;
use std::time::Duration;
use std::time::SystemTime;

#[derive(Copy, Clone)]
struct Pepper {
    current: [u8; 32],
    last: Option<[u8; 32]>,
}

#[derive(Copy, Clone)]
struct TimeBoundPepper {
    created: SystemTime,
    pepper: Pepper,
}

impl Pepper {
    fn new(last: [u8; 32]) -> Self {
        Pepper {
            current: rand::random(),
            last: Some(last),
        }
    }

    fn current(&self) -> &[u8; 32] {
        return &self.current;
    }

    fn last(&self) -> &Option<[u8; 32]> {
        return &self.last;
    }
}

impl Default for Pepper {
    fn default() -> Self {
        Pepper {
            current: rand::random(),
            last: None,
        }
    }
}

impl TimeBoundPepper {
    fn new(last: [u8; 32]) -> Self {
        TimeBoundPepper {
            created: SystemTime::now(),
            pepper: Pepper::new(last),
        }
    }

    pub fn is_expired(&self, expires: Duration) -> bool {
        let time_now = self.created.elapsed().unwrap();
        println!(
            "Comparing Time: {0}, {1}",
            time_now.as_millis(),
            expires.as_millis()
        );

        return self.created.elapsed().unwrap() >= expires;
    }

    fn pepper(&self) -> &Pepper {
        &self.pepper
    }
}

impl Default for TimeBoundPepper {
    fn default() -> Self {
        TimeBoundPepper {
            created: SystemTime::now(),
            pepper: Default::default(),
        }
    }
}

#[macro_use]
extern crate lazy_static;

lazy_static! {
    static ref TIME_BOUND_PEPPER: ArcSwap<TimeBoundPepper> = ArcSwap::default();
}

fn get_pepper(max_age: Duration) -> Pepper {
    if TIME_BOUND_PEPPER.load().is_expired(max_age * 2) {
        println!("Both Peppers have Expired!");
        TIME_BOUND_PEPPER.swap(Arc::new(TimeBoundPepper::default()));
    }

    let current_pepper = TIME_BOUND_PEPPER.load().pepper().current().clone();

    if TIME_BOUND_PEPPER.load().is_expired(max_age) {
        println!("Rotate Peppers!");
        TIME_BOUND_PEPPER.swap(Arc::new(TimeBoundPepper::new(current_pepper)));
    }

    return TIME_BOUND_PEPPER.load().pepper().clone();
}

fn u8_32_to_hex_string(maybe_array: Option<[u8; 32]>) -> String {
    match maybe_array {
        None => return "None".to_string(),
        Some(array) => {
            let mut str = String::with_capacity(2 * 32);

            for byte in array {
                let result = write!(&mut str, "{:02X}", byte);

                match result {
                    Err(e) => println!("error parsing bytes: {e:?}"),
                    Ok(_) => continue,
                }
            }
            return str;
        }
    }
}

fn main() {
    let max_age: Duration = Duration::from_millis(20);

    let pepper = get_pepper(max_age);

    println!(
        "Pepper, last: \"{}\", current: \"{}\",",
        u8_32_to_hex_string(pepper.last().clone()),
        u8_32_to_hex_string(Some(pepper.current().clone()))
    );

    sleep(Duration::from_millis(18));

    let pepper2 = get_pepper(max_age);

    println!(
        "Pepper, last: \"{0}\", current: \"{1}\",",
        u8_32_to_hex_string(pepper2.last().clone()),
        u8_32_to_hex_string(Some(pepper2.current().clone()))
    );

    sleep(Duration::from_millis(18));

    let pepper3 = get_pepper(max_age);

    println!(
        "Pepper, last: \"{0}\", current: \"{1}\",",
        u8_32_to_hex_string(pepper3.last().clone()),
        u8_32_to_hex_string(Some(pepper3.current().clone()))
    );

    sleep(Duration::from_millis(22));

    let pepper4 = get_pepper(max_age);

    println!(
        "Pepper, last: \"{0}\", current: \"{1}\",",
        u8_32_to_hex_string(pepper4.last().clone()),
        u8_32_to_hex_string(Some(pepper4.current().clone()))
    );

    sleep(Duration::from_millis(40));

    let pepper5 = get_pepper(max_age);

    println!(
        "Pepper, last: \"{0}\", current: \"{1}\",",
        u8_32_to_hex_string(pepper5.last().clone()),
        u8_32_to_hex_string(Some(pepper5.current().clone()))
    );
}

da2ce7 avatar Aug 17 '22 23:08 da2ce7

hi @da2ce7 cool!

Questions:

  • In your initial proposal, you also use a hash. Is there any reason you are not using it?

Pros:

  • Precision is better. Since we do not use fixed slots of two minutes starting at even minutes.
  • We do not need a hash function.
  • The pepper is totally random instead of always using the same secret.

More thoughts:

In this implementation, the TimeBoundPepper is a singleton value that changes every two minutes. It's like a central obfuscated chronometer counting time in 2-minute slots. You always ask for the current time in your own EPOCH, which is the time the server starts.

In my current implementation, the TimeBoundPepper is like a Value Object, it's not the clock, it's a given "obfuscated" time. I started from your proposal but I ended up changing it a little bit, I think because of testing. It would be nice to explore how to test your proposal. I have the feeling that central singleton objects are always harder to test or deal with when you have concurrency. But it's still only a feeling. In fact, if I'm not wrong, Rust makes it hard to create those "static" values. You need to use that weird macro. I'm going to search for more info about this topic in the Rust book I read. I think they mentioned something about global variables.

josecelano avatar Aug 18 '22 09:08 josecelano

@da2ce7, @WarmBeer and I are considering doing a benchmarking for memory consumption. We have the feeling that @WarmBeer's first implementation (using hash) would consume much less memory, and that could allow more requests per second.

On the other hand, since the connection Id only lasts for 2 minutes, making it much more secure could not make sense if we consume more resources (CPU or memory).

I found another implementation:

https://github.com/OlafvdSpek/xbt/blob/a214b8e954ee1cee68b1363e2f0bfaacbb37df90/Tracker/transaction.cpp#L12-L21

long long Ctransaction::connection_id() const
{
	const int cb_s = 24;
	char s[cb_s];
	write_int(8, s, srv_secret());
	memcpy(s, m_a.sin6_addr.s6_addr, 16);
	char d[20];
	Csha1(data_ref(s, cb_s)).read(d);
	return read_int(8, d);
}

It's the implementation of @OlafvdSpek, who is the original author of the BEP 15.

It seems he also using a simple secret, hash (sha1) and the client IP. WIth a 20 bytes array.

josecelano avatar Aug 18 '22 11:08 josecelano

hi @WarmBeer I have to generate the SERVER_SECRET when the UDP tracker starts. Should all jobs use the same secret? If yes, I suppose it would be something like this:

    pub async fn start(&self) {
        loop {
            let mut data = [0; MAX_PACKET_SIZE];
            let socket = self.socket.clone();
            let tracker = self.tracker.clone();

            // TODO: random stuff
            let server_secret = ByteArray32::new([0;32]); 

            tokio::select! {
                _ = tokio::signal::ctrl_c() => {
                    info!("Stopping UDP server: {}..", socket.local_addr().unwrap());
                    break;
                }
                Ok((valid_bytes, remote_addr)) = socket.recv_from(&mut data) => {
                    let payload = data[..valid_bytes].to_vec();

                    debug!("Received {} bytes from {}", payload.len(), remote_addr);
                    debug!("{:?}", payload);

                    let response = handle_packet(server_secret, remote_addr, payload, tracker).await;
                    UdpServer::send_response(socket, remote_addr, response).await;
                }
            }
        }
    }

...

pub async fn handle_packet(server_secret: &ByteArray32, remote_addr: SocketAddr, payload: Vec<u8>, tracker: Arc<TorrentTracker>) -> Response {
    match Request::from_bytes(&payload[..payload.len()], MAX_SCRAPE_TORRENTS).map_err(|_| ServerError::InternalServerError) {
        Ok(request) => {
            let transaction_id = match &request {
                Request::Connect(connect_request) => {
                    connect_request.transaction_id
                }
                Request::Announce(announce_request) => {
                    announce_request.transaction_id
                }
                Request::Scrape(scrape_request) => {
                    scrape_request.transaction_id
                }
            };

            match handle_request(request, server_secret, remote_addr, tracker).await {
                Ok(response) => response,
                Err(e) => handle_error(e, transaction_id)
            }
        }
        ...
    }
}

...

pub async fn handle_request(request: Request, server_secret: &ByteArray32, remote_addr: SocketAddr, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {
    match request {
        Request::Connect(connect_request) => {
            handle_connect(server_secret, remote_addr, &connect_request, tracker).await
        }
        Request::Announce(announce_request) => {
            handle_announce(remote_addr, &announce_request, tracker).await
        }
        Request::Scrape(scrape_request) => {
            handle_scrape(remote_addr, &scrape_request, tracker).await
        }
    }
}

...

pub async fn handle_connect(server_secret: &ByteArray32, remote_addr: SocketAddr, request: &ConnectRequest, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {
    let current_timestamp = current_timestamp();
    let connection_id = get_connection_id(&server_secret, &remote_addr, current_timestamp);

    debug!("connection_id: {:?}, current timestamp: {:?}", connection_id, current_timestamp);

    let response = Response::from(ConnectResponse {
        transaction_id: request.transaction_id,
        connection_id,
    });

    // send stats event
    match remote_addr {
        SocketAddr::V4(_) => { tracker.send_stats_event(TrackerStatisticsEvent::Udp4Connect).await; }
        SocketAddr::V6(_) => { tracker.send_stats_event(TrackerStatisticsEvent::Udp6Connect).await; }
    }

    Ok(response)
}

Basically passing down the secret.

@da2ce7 the final ID generation implementation is like this:

time_bound_pepper = Hash(Server_Secret || Unix_Time_Minutes / 2)       (32 bytes, 256 bits)
hash_input = Concat(time_bound_pepper, authentication_string)          (64 bytes, 512 bits)
connection_id = Truncate(Hash(hash_input))                             ( 8 bytes,  64-bits)

In your proposal:

ConnectionID = Hash(Time_Bound_Pepper || Authentication_String) (64-bit)

Time_Bound_Pepper and Authentication_String are 32 bytes. I contact them generating a 64 bytes input for the hash function. The result is truncated to 64 bits.

I do not know if your idea was to truncate before hashing:

ConnectionID = Hash(Time_Bound_Pepper_32_bits || Authentication_String_32_bits) (64-bit)

I suppose it does not matter as long as we truncate, removing the first bytes (bytes filled with zeros to generate the 32 bytes array).

After discussing it with @WarmBeer we think for now we can just pass the secret the way I proposed. So we can close this PR, but we can create a new issue to refactor the UDP handlers. Right now, there are only some stateless functions that handle the requests. We could:

  • Create a new struct: UdpTracker
  • We can add a context or config attribute with the secret. It will contain the configuration needed to handle the request.
  • We can also add a UdpRequest wrapper on the current request. It will contain metadata about the request, like the source IP. Like HTTPRequest classes in web app frameworks.

This way calling the handlers would be cleaner and more stable:

Instead of:

pub async fn handle_connect(remote_addr: SocketAddr, request: &ConnectRequest, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {

We would have:

pub async fn handle_connect(remote_addr: SocketAddr, request: &ConnectRequest, tracker: Arc<TorrentTracker>) -> Result<Response, ServerError> {

udpTracker.handle_connect(&self, udp_request: &UdpRequest) {
   ...
   &self.config.secret // The secret for having
   ... 
   udp_request.protocol_request // The current request: &ConnectRequest
}

josecelano avatar Aug 18 '22 13:08 josecelano