freeradius-server icon indicating copy to clipboard operation
freeradius-server copied to clipboard

How to include the Trust Router functionality in v4.0

Open alejandro-perez opened this issue 8 years ago • 9 comments

Issue type

  • Questions about the server or its usage should be posted to the users mailing list.
  • Remote security exploits MUST be sent to [email protected].
  • [ ] Defect - Crash or memory corruption.
  • [ ] Defect - Non compliance with a standards document, or incorrect API usage.
  • [ ] Defect - Unexpected behaviour (obvious or verified by project member).
  • [X] Feature request.

See here for debugging instructions and how to obtain backtraces.

NOTE: PATCHES GO IN PULL REQUESTS. IF YOU SUBMIT A DIFF HERE, THE DEVELOPMENT TEAM WILL HUNT YOU DOWN AND BEAT YOU OVER THE HEAD WITH YOUR OWN KEYBOARD.

Feature description

This thread is intended to discuss how the existing Trust Router client functionality can be implemented in v4.0, where rlm_realm module is gone.

Initial ideas/thoughts can be found in #2007:

Introduction (in a really small nutshell)

The trust router is a service that allows establishing shared secrets between AAA entities based on the shared trust they have in an entity called the "trust router server". In practice (ie. Moonshot), it is used to dynamically negotiate TLS-PSKs between FreeRadius endpoints that had no knowledge of each other prior that negotiation. In FR v3.0:

  1. On the client side, when the rlm_realm module detects a Request destined to a unknown realm, it makes use of the trust router libraries to query the "trust router server" and get the details for the new realm (that is, IP, port, keying material, expiration...). Then, the realm is added to the realm btree. Besides, as these security associations expire, a rekeying functionality has recently been added to actively refresh them before they become unusable.
  2. On the server side, there is a separated daemon (tids) which stores the negotiated keying material on an sqlite DB. We then use FR's ability to set TLS-PSK form a SQL DB to make this work in a very straightforward way.

Problematic

In FR v4.0 rlm_realm has been removed, and the framework should allow for dynamic home servers. This raises several questions/challenges (they may be more).

  1. How can we make FR to query the TR, and use the results, when a realm is unknown? If realm handling is done in the core server, do you envisage some sort of callback or anything similar, such as "resolver" modules? Or, is this something that can be done in the "pre-proxy" section with a regular module?

  2. How do we handle rekeying? The process of establishing trust router security associations can take a while (several seconds), specially under certain circumstances. That's why it is a good idea to perform a rekeying process before TLS keys expire. Otherwise, the first end user trying to use an expired association will have to wait until it is refreshed. Now, this implies some sort of "proactivity" on the client, which typically works in a "reactive" way (that's why I added the dedicated thread to the rlm_realm module).

  3. In the server side, can all be kept the same it is now for v3? or does it need to change?

alejandro-perez avatar Jul 14 '17 08:07 alejandro-perez

How can we make FR to query the TR, and use the results, when a realm is unknown?

Sounds like some kind of defined API is needed, where FreeRADIUS cycles through a list of "realm providers" asking them if they know a realm.

How do we handle rekeying?

I think DNS has a good model for this. Realms could be returned with Expire/Minimum/Refresh/Retry values. I seem to recall v4 has some kind of event queue? Could this be used to fire an asymmetric realm lookup?

In the server side, can all be kept the same it is now for v3? or does it need to change?

I hope so!

A lot of the issues we have would also be problems for radsec dynamic discovery (and more interesting things - realm lookups via http/sql?).

adam-bishop avatar Jul 14 '17 11:07 adam-bishop

Sounds like some kind of defined API is needed, where FreeRADIUS cycles through a list of "realm providers" asking them if they know a realm.

That's a module. :) And configurable in unlang.

When a home server is added, it can have it's own expire / refresh / etc. values. That should all be in the server core.

A lot of the issues we have would also be problems for radsec dynamic discovery (and more interesting things - realm lookups via http/sql?).

Yes. The goal is to allow dynamic realms / home-servers from any source. That will be much more flexible than v3

alandekok avatar Jul 14 '17 11:07 alandekok

That's a module. :) And configurable in unlang.

I guess this is a new type of module? Because typical modules do not have a interface to query realms, do they?

When a home server is added, it can have it's own expire / refresh / etc. values. That should all be in the server core.

Yes, but every "realm" source may have different ways of handing expiration/refresh events. Will their "callbacks" be called upon these events. That'd be superb.

Yes. The goal is to allow dynamic realms / home-servers from any source. That will be much more flexible than v3

That's really great. A lot of flexibility will come from here. Indeed, I see two major options:

  1. Have a full-fledged module, which does everything within FR.
  2. Or have a external daemon (as we already do for the server side), which provides a REST/SQL/socket interface to the module. The module in FR would be then be really thin, and would act like a proxy.

Benefits of 2) are decoupling, ability of using different languages (or even machines). Disadvantages of 2) include worse performance, depending on how many remote calls are needed and how often, and having to execute a second daemon, which I don't usually like if avoidable.

alejandro-perez avatar Jul 14 '17 11:07 alejandro-perez

I guess this is a new type of module? Because typical modules do not have a interface to query realms, do they?

The point is that in v3 and earlier, realms were magic. They were welded into the server core and rlm_realm. In v4, there's no magic. There is no "proxying" in the server core. Instead, there's a RADIUS client module which queries home servers, just like the SQL module queries SQL databases.

At that point, dynamic home servers just becomes dynamic updates to a module. Realms lose all magic, and just become module configuration.

Yes, but every "realm" source may have different ways of handing expiration/refresh events. Will their "callbacks" be called upon these events.

Probably not. The goal is to allow create / renew / expire with configurable timers. The RADIUS client module then just either expires the realm / home server, or gets told to keep it alive.

The point is to separate data sources from how that data is used. The more tightly they are tied together, the harder it is to change anything.

Perhaps you could explain what callbacks you need when a home server expires.

A lot of flexibility will come from here. Indeed, I see two major options:

Both of those are the same option:

  1. something somehow gets realm information, and sends it to the RADIUS client module. The communication between the two is attributes, just like v3 with dynamic clients.

You're free to write a custom module to query your own DB, or you can use unlang to query SQL / REST / whatever, and get attributes that way.

alandekok avatar Jul 14 '17 12:07 alandekok

The point is that in v3 and earlier, realms were magic. They were welded into the server core and rlm_realm. In v4, there's no magic. There is no "proxying" in the server core. Instead, there's a RADIUS client module which queries home servers, just like the SQL module queries SQL databases.

At that point, dynamic home servers just becomes dynamic updates to a module. Realms lose all magic, and just become module configuration.

Sounds like a good design and for sure a far less hackish approach for us.

The point is to separate data sources from how that data is used. The more tightly they are tied together, the harder it is to change anything.

Agreed.

Perhaps you could explain what callbacks you need when a home server expires.

What I need is to be able to renew the TLS keys associated to a particular realm before they expire. "Expire + get new one" procedure does not work well for us, since the establishment process is slow and hence some user authentications might seem really slow for the user with no reason.

So the callback would be something such as "renew_realm_info()", which will have a configurable timer < expiration_time. That is, a realm is not used beyond its expiration time, but say 60s beforehand I got notified so I can start doing negotiation and establish new keys before expiration (this would be indeed very similar to IKEv2/IPsec rekeying process). It is important that, while refreshing is being done, users can still use the old realm info for authenticating, creating no disruption for them.

In option 2), that is, having a external daemon that do the refreshing, this would not be needed, as upon expiration FR will query for the new keys that will be there already.

Both of those are the same option:

something somehow gets realm information, and sends it to the RADIUS client module. The communication between the two is attributes, just like v3 with dynamic clients. You're free to write a custom module to query your own DB, or you can use unlang to query SQL / REST / whatever, and get attributes that way.

I see what you mean. Again, the main "challenge" I see is how you are going to handle expiration/renew of realms. It can be:

  1. Passive. A request comes, FR lookups realm information from the cache. It is expired. FR calls the unlang code to retrieve new information. In this case, proactive rekeying has to be done outside FR.
  2. Proactive. A timer is set so when realm expires, FR calls the unlang code to retrieve new information. In this case, this refreshing is the actual rekeying. The problem with this approach is that, if a user authentication happens in another thread while refreshing, it either fails for expired key, or has to wait until the new keys are established since old ones are expired.
  3. Proactive with soft expiration. A timer is set before actual expiration. FR calls the unlang code to retrieve new information BUT old realm can be used by other threads while not updated, since it is not expired yet.

Obviously, my preferences are with 3). That is the "callback" I was referring to.

alejandro-perez avatar Jul 15 '17 11:07 alejandro-perez

The short answer is that we need a "renewal" timer which is different from the "expiry" timer.

The new design has an event loop per thread. Modules can add events to that event loop. Which means that when the "dynamic home server" modules hits the "renew" timer, it can take action to (somehow) renew / refresh the data.

alandekok avatar Jul 15 '17 13:07 alandekok

The short answer is that we need a "renewal" timer which is different from the "expiry" timer.

Exactly

The new design has an event loop per thread. Modules can add events to that event loop. Which means that when the "dynamic home server" modules hits the "renew" timer, it can take action to (somehow) renew / refresh the data.

It seems pretty straightforward. I think implementing the TR client functionality in v4.0 will actually be easier than it was for v3.0. Is this new design already implemented, so it can be tested?

alejandro-perez avatar Jul 16 '17 06:07 alejandro-perez

v4.0.x is running now. The APIs are pretty much stable at this point.

You may want to wait a bit for me to finish the RADIUS client module (outgoing packets / proxying). That will make the rest of the work clearer.

alandekok avatar Jul 16 '17 11:07 alandekok

Sure I will. Thanks!

alejandro-perez avatar Jul 16 '17 12:07 alejandro-perez