otp icon indicating copy to clipboard operation
otp copied to clipboard

[RFC] Add kernel TLS support in prim_inet, inet_drv, and ssl application

Open zzydxm opened this issue 2 years ago • 27 comments

Linux kernel supports TLS encryption/decryption offload from user space to kernel space (https://www.kernel.org/doc/html/latest/networking/tls.html). Add options in prim_inet and inet_drv to support setting this up using atoms, and get success/fail responses. Also add ssl application change to illustrate how it will be used.

These two commits is a follow up of discussion in https://erlangforums.com/t/introduce-kernel-tls-in-ssl-application/952 We already experimented replacing inet6_tls_dist with kernel TLS dist protocol in our cluster, and saw 10%~20% global CPU savings on our major services. We think kernel TLS is a good feature to be supported in OTP, and the change will contain 3 major parts.

  1. First commit (optional): add constants in inet_drv, better configuration in prim_inet, and be able to detect setopt failures.
  2. Second commit (add 'use_ktls' in ssl options): With this option enabled, ssl will do kTLS offload setup in ssl_gen_statem:prepare_connection, move the socket control back to the user pid, and close the tls_sender/receiver processes. ssl:send/recv API will also be modified accordingly. (Step 1 is optional because those TLS cipher constants must be copied to the ssl application.)
  3. add ktls_dist and ktls6_dist to use kernel TLS feature in dist (optional)

I would like to get some feedback on:

  1. Is there any concern on adding these TLS related code in inet_drv? or raw options are more suggested.
  2. If it is fine to add these prim_inet options. Does the current API of tcp_ulp/tls_tx/tls_rx looks good?

some other options:

  1. make {tcp_ulp, tls} being the only supported option, instead of {tcp_ulp, <<"tls">>} which supports other TCP_ULP
  2. use {ktls, init} instead of {tcp_ulp, <<"tls">>}
  3. use {tls_1_3_aes_gcm_256_tx, <<WriteIV/binary, WriteKey/binary, WriteSalt/binary, WriteSeq:64>>} instead of {tls_tx, <<4, 3, 52, 0, WriteIV/binary, WriteKey/binary, WriteSalt/binary, WriteSeq:64>>}, this will occupy 12 integers in inet_drv options, where there are only 50 available (out of 256) to be used now
  4. use {ktls, {tls_1_3_aes_gcm_256_tx, <<WriteIV/binary, WriteKey/binary, WriteSalt/binary, WriteSeq:64>>}} instead of {tls_tx, <<4, 3, 52, 0, WriteIV/binary, WriteKey/binary, WriteSalt/binary, WriteSeq:64>>}

Is there any suggestion on this? Thanks!

A note not related to this commit but related to the entire kernel TLS effort: I found that once tcp_ulp/tls_tx/tls_rx has been set, the option cannot be changed. So TLS 1.3 key update will result as a disconnect and we need to redo the handshake. I checked openssl 3.1 and it is also the behavior there.

cc: @IngelaAndin

zzydxm avatar Mar 30 '22 21:03 zzydxm

CT Test Results

       5 files     232 suites   2h 11m 35s :stopwatch: 3 286 tests 3 035 :heavy_check_mark:    251 :zzz: 0 :x: 6 484 runs  5 481 :heavy_check_mark: 1 003 :zzz: 0 :x:

Results for commit 076dd94b.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Mar 30 '22 21:03 github-actions[bot]

Added second commit to illustrate the change in ssl application

zzydxm avatar Apr 01 '22 17:04 zzydxm

Pushed a new version that fixes the first application data issue by always using {active, false} slides: https://docs.google.com/presentation/d/1V0hmWCBzuSYJA4Yh-SC8WOnf1_k1ZqKdEqhkcANrAss/edit?usp=sharing discussion doc: https://docs.google.com/document/d/1psr-MEmYQHsZ6db8oM8ZPMqveHQBvZ6SeFZLZaTT26Q/edit

zzydxm avatar May 05 '22 17:05 zzydxm

As I understand your use case for this is the Erlang distribution and the performance boost is expected to be greater for this case than in the normal case?! Although the test provided are only for the normal case. Is your plan to contribute a distribution module later?

As you have written this should be an experimental option and it imposes many restrictions upon the TLS protocol, by enabling it a system would not behave backward compatible. I am thinking that this could be a candidate for using the feature mechanism, even though it is not a language change, I believe that we also want to handle such features with caution. I do not think it is obvious that we at all want to support this for the common case. It needs more investigating. And also it could be a good comparison for other optimizations that we are planning. As far as I understood I thought the first approach was to provide this experimental feature only for the distribution case, but that is not how the current PR is going. If we should go in this direction I really think the feature way is the way to go.

IngelaAndin avatar Jun 08 '22 14:06 IngelaAndin

I do not know how your distribution module handles this but in your test, you give the socket back to the same process that receives data. So you will be sending and receiving data from the same process. There is a reason that we have a special sender process, that is we want to avoid a possible deadlock that can occur when prim_inet:send function blocks.

IngelaAndin avatar Jun 09 '22 05:06 IngelaAndin

We discussed this a little further, internally, and we would prefer a solution where all ktls knowledge could be in a separate distribution module (that would not need to be part of OTP) but that would work out of the box with OTP. I think we could extend connection_information to be able to export the data you need to init the payload encryption. You could use a passive socket with the tls packet option (that we could document) for the inet driver and use internal_active_n set to 1 to avoid changing the existing code. We might also need to support some special option to ssl:controlling_process process to let some distribution process handle the payload and gracefully shut down the connection processes. That is a sketch of how we could get some support for this and avoid experimental features (however they are marked as experimental).

We do not feel very compelled to add new inet driver features as we have a long-term plan to replace it with the socket module and some other gen_ code. Anything to add @bmk and @RaimoNiskanen ?

IngelaAndin avatar Jun 09 '22 13:06 IngelaAndin

Thanks for all the comments!

We do have a single dist protocol module file achieves kTLS without OTP modification and is running on all nodes we have. I will clean it up and share it shortly with a new PR for it.

There is one problem and one question of the dist protocol module:

  1. Early application data issue is not resolved. What we currently do is to sleep 1 second when doing connection establishment. To make the connection immediate, we need enable inet_drv TLS mode for it in ssl and use {active,1}, I will look into it, thanks for pointing it out!
  2. With kTLS dist module, the Erlang distribution is running TCP mode, we directly give the inet port to the distribution, and do not use dist_ctrl_put_data/dist_ctrl_get_data, this gives us large performance boost. Will the TCP mode goes away from Erlang distribution? Will there be a socket based replacement?

zzydxm avatar Jun 13 '22 16:06 zzydxm

Small hijack: RabbitMQ would very much welcome a larger scope for this PR that extends beyond distribution. The use case is that for streams and the stream protocol, RabbitMQ uses sendfile, which currently does not work with TLS, and so we do normal reads+sends when TLS is used. Having kTLS would potentially allow us to use the proper sendfile with TLS and nice performance along with it.

The same applies to Cowboy and I'm sure other projects.

essen avatar Jun 13 '22 16:06 essen

Example dist module here, will do a more complete version in a different PR:

terminal 1:

erl -proto_dist ktls -sname a@localhost

terminal 2:

erl -proto_dist ktls -sname b@localhost
(b@localhost)1> net_kernel:connect_node('a@localhost').
true

code: ktls_dist.erl

-module(ktls_dist).

-export([
    childspecs/0
]).

-export([
    listen/2,
    accept/1,
    accept_connection/5,
    setup/5,
    close/1,
    select/1,
    address/0
]).

%% internal exports
-export([
    do_accept/7,
    do_setup/7
]).

-import(error_logger, [
    error_msg/2
]).

-include_lib("kernel/include/net_address.hrl").

-include_lib("kernel/include/dist_util.hrl").
-include_lib("kernel/include/logger.hrl").

-include_lib("ssl/src/ssl_api.hrl").
-include_lib("ssl/src/ssl_cipher.hrl").
-include_lib("ssl/src/ssl_connection.hrl").

childspecs() ->
    inet_tls_dist:childspecs().

select(Node) ->
    inet_tls_dist:select(Node).

address() ->
    inet_tls_dist:address().

listen(Name, Host) ->
    inet_tls_dist:listen(Name, Host).

close(Socket) ->
    inet_tls_dist:close(Socket).

%% ------------------------------------------------------------
%% Accepts new connection attempts from other Erlang nodes.
%% ------------------------------------------------------------

accept(Listen) ->
    gen_accept(inet_tcp, Listen).

gen_accept(Driver, Listen) ->
    Kernel = self(),
    spawn_opt(
        fun() ->
            process_flag(trap_exit, true),
            LOpts = application:get_env(kernel, inet_dist_listen_options, []),
            MaxPending =
                case lists:keyfind(backlog, 1, LOpts) of
                    {backlog, Backlog} -> Backlog;
                    false -> 128
                end,
            DLK = {Driver, Listen, Kernel},
            accept_loop(DLK, spawn_accept(DLK), MaxPending, #{})
        end,
        [link, {priority, max}]
    ).

%% Concurrent accept loop will spawn a new HandshakePid when
%%  there is no HandshakePid already running, and Pending map is
%%  smaller than MaxPending
accept_loop(DLK, undefined, MaxPending, Pending) when map_size(Pending) < MaxPending ->
    accept_loop(DLK, spawn_accept(DLK), MaxPending, Pending);
accept_loop(DLK, HandshakePid, MaxPending, Pending) ->
    receive
        {continue, HandshakePid} when is_pid(HandshakePid) ->
            accept_loop(DLK, undefined, MaxPending, Pending#{HandshakePid => true});
        {'EXIT', Pid, Reason} when is_map_key(Pid, Pending) ->
            Reason =/= normal andalso
                ?LOG_ERROR("TLS distribution handshake failed: ~p~n", [Reason]),
            accept_loop(DLK, HandshakePid, MaxPending, maps:remove(Pid, Pending));
        {'EXIT', HandshakePid, Reason} when is_pid(HandshakePid) ->
            %% HandshakePid crashed before turning into Pending, which means
            %%  error happened in accept. Need to restart the listener.
            exit(Reason);
        Unexpected ->
            ?LOG_WARNING("TLS distribution: unexpected message: ~p~n", [Unexpected]),
            accept_loop(DLK, HandshakePid, MaxPending, Pending)
    end.

spawn_accept({Driver, Listen, Kernel}) ->
    AcceptLoop = self(),
    spawn_link(
        fun() ->
            case Driver:accept(Listen) of
                {ok, Socket} ->
                    AcceptLoop ! {continue, self()},
                    case check_ip(Driver, Socket) of
                        true ->
                            accept_one(Driver, Kernel, Socket);
                        {false, IP} ->
                            ?LOG_ERROR(
                                "** Connection attempt from "
                                "disallowed IP ~w ** ~n",
                                [IP]
                            ),
                            {disallowed, IP}
                    end;
                Error ->
                    exit(Error)
            end
        end
    ).

accept_one(Driver, Kernel, Socket) ->
    % SslOpts = proplists:delete(ciphers, ets:lookup_element(ssl_dist_opts, server, 2)),
    SslOpts = application:get_env(kernel, ktls_server_opts, server_ssl_opts()),
    case
        ssl:handshake(
            Socket,
            [{erl_dist, true}, {active, false}, {packet, 4} | SslOpts],
            net_kernel:connecttime()
        )
    of
        % {ok, #sslsocket{pid = [Receiver | _]} = SslSocket} ->
        {ok, {sslsocket, _, [Receiver | _]}} ->
            % kTLS: set active to false to reduce the possibility that
            % dist_util handshake data from client side reaches Receiver process
            inet:setopts(Socket, [{active, false}]),

            Kernel ! {accept, self(), Socket, Driver:family(), tls},
            receive
                {Kernel, controller, Pid} ->
                    set_ktls(Socket, Receiver, server, Pid),
                    Pid ! {self(), controller};
                {Kernel, unsupported_protocol} ->
                    exit(unsupported_protocol)
            end;
        {error, {options, _}} = Error ->
            %% Bad options: that's probably our fault.
            %% Let's log that.
            ?LOG_ERROR(
                "Cannot accept TLS distribution connection: ~s~n",
                [ssl:format_error(Error)]
            ),
            gen_tcp:close(Socket),
            Error;
        Other ->
            gen_tcp:close(Socket),
            Other
    end.

%% ------------------------------------------------------------
%% Accepts a new connection attempt from another Erlang node.
%% Performs the handshake with the other side.
%% ------------------------------------------------------------

accept_connection(AcceptPid, Socket, MyNode, Allowed, SetupTime) ->
    gen_accept_connection(inet_tcp, AcceptPid, Socket, MyNode, Allowed, SetupTime).

gen_accept_connection(Driver, AcceptPid, Socket, MyNode, Allowed, SetupTime) ->
    spawn_opt(
        ?MODULE,
        do_accept,
        [Driver, self(), AcceptPid, Socket, MyNode, Allowed, SetupTime],
        [link, {priority, max}]
    ).

do_accept(Driver, Kernel, AcceptPid, Socket, MyNode, Allowed, SetupTime) ->
    MRef = erlang:monitor(process, AcceptPid),
    receive
        {AcceptPid, controller} ->
            erlang:demonitor(MRef, [flush]),
            Timer = dist_util:start_timer(SetupTime),
            case check_ip(Driver, Socket) of
                true ->
                    HSData = #hs_data{
                        kernel_pid = Kernel,
                        this_node = MyNode,
                        socket = Socket,
                        timer = Timer,
                        this_flags = 0,
                        allowed = Allowed,
                        f_send = fun Driver:send/2,
                        f_recv = fun Driver:recv/3,
                        f_setopts_pre_nodeup =
                            fun(S) ->
                                inet:setopts(
                                    S,
                                    [
                                        {active, false},
                                        {packet, 4},
                                        nodelay()
                                    ]
                                )
                            end,
                        f_setopts_post_nodeup =
                            fun(S) ->
                                inet:setopts(
                                    S,
                                    [
                                        {active, true},
                                        {deliver, port},
                                        {packet, 4},
                                        binary,
                                        nodelay()
                                    ]
                                )
                            end,
                        f_getll = fun(S) ->
                            inet:getll(S)
                        end,
                        f_address = fun(S, Node) -> get_remote_id(Driver, S, Node) end,
                        mf_tick = fun(S) -> inet_tcp_dist:tick(Driver, S) end,
                        mf_getstat = fun inet_tcp_dist:getstat/1,
                        mf_setopts = fun inet_tcp_dist:setopts/2,
                        mf_getopts = fun inet_tcp_dist:getopts/2
                    },
                    dist_util:handshake_other_started(HSData);
                {false, IP} ->
                    error_msg(
                        "** Connection attempt from "
                        "disallowed IP ~w ** ~n",
                        [IP]
                    ),
                    ?shutdown(no_node)
            end;
        {'DOWN', MRef, _, _, _Reason} ->
            ?shutdown(no_node)
    end.

%% we may not always want the nodelay behaviour
%% for performance reasons

nodelay() ->
    case application:get_env(kernel, dist_nodelay) of
        undefined ->
            {nodelay, true};
        {ok, true} ->
            {nodelay, true};
        {ok, false} ->
            {nodelay, false};
        _ ->
            {nodelay, true}
    end.

%% ------------------------------------------------------------
%% Get remote information about a Socket.
%% ------------------------------------------------------------
get_remote_id(Driver, Socket, Node) ->
    case inet:peername(Socket) of
        {ok, Address} ->
            case split_node(atom_to_list(Node), $@, []) of
                [_, Host] ->
                    #net_address{
                        address = Address,
                        host = Host,
                        protocol = tcp,
                        family = Driver:family()
                    };
                _ ->
                    %% No '@' or more than one '@' in node name.
                    ?shutdown(no_node)
            end;
        {error, _Reason} ->
            ?shutdown(no_node)
    end.

%% ------------------------------------------------------------
%% Setup a new connection to another Erlang node.
%% Performs the handshake with the other side.
%% ------------------------------------------------------------

setup(Node, Type, MyNode, LongOrShortNames, SetupTime) ->
    gen_setup(inet_tcp, Node, Type, MyNode, LongOrShortNames, SetupTime).

gen_setup(Driver, Node, Type, MyNode, LongOrShortNames, SetupTime) ->
    spawn_opt(
        ?MODULE,
        do_setup,
        [Driver, self(), Node, Type, MyNode, LongOrShortNames, SetupTime],
        [link, {priority, max}]
    ).

do_setup(Driver, Kernel, Node, Type, MyNode, LongOrShortNames, SetupTime) ->
    ?trace("~p~n", [{inet_tcp_dist, self(), setup, Node}]),
    [Name, Address] = splitnode(Driver, Node, LongOrShortNames),
    AddressFamily = Driver:family(),
    ErlEpmd = net_kernel:epmd_module(),
    Timer = dist_util:start_timer(SetupTime),
    case call_epmd_function(ErlEpmd, address_please, [Name, Address, AddressFamily]) of
        {ok, Ip, TcpPort, Version} ->
            ?trace(
                "address_please(~p) -> version ~p~n",
                [Node, Version]
            ),
            do_setup_connect(
                Driver,
                Kernel,
                Node,
                Address,
                AddressFamily,
                Ip,
                TcpPort,
                Version,
                Type,
                MyNode,
                Timer
            );
        {ok, Ip} ->
            case ErlEpmd:port_please(Name, Ip) of
                {port, TcpPort, Version} ->
                    ?trace(
                        "port_please(~p) -> version ~p~n",
                        [Node, Version]
                    ),
                    do_setup_connect(
                        Driver,
                        Kernel,
                        Node,
                        Address,
                        AddressFamily,
                        Ip,
                        TcpPort,
                        Version,
                        Type,
                        MyNode,
                        Timer
                    );
                _ ->
                    ?trace("port_please (~p) failed.~n", [Node]),
                    ?shutdown(Node)
            end;
        _Other ->
            ?trace(
                "inet_getaddr(~p) "
                "failed (~p).~n",
                [Node, _Other]
            ),
            ?shutdown(Node)
    end.

%%
%% Actual setup of connection
%%
do_setup_connect(
    Driver,
    Kernel,
    Node,
    Address,
    AddressFamily,
    Ip,
    TcpPort,
    Version,
    Type,
    MyNode,
    Timer
) ->
    dist_util:reset_timer(Timer),
    % SslOpts = proplists:delete(ciphers, ets:lookup_element(ssl_dist_opts, client, 2)),
    SslOpts = application:get_env(kernel, ktls_client_opts, client_ssl_opts()),
    case
        ssl:connect(
            Ip,
            TcpPort,
            connect_options([
                binary,
                {erl_dist, true},
                {active, false},
                {packet, 4},
                {server_name_indication, Address},
                Driver:family(),
                {nodelay, true}
                | SslOpts
            ]),
            net_kernel:connecttime()
        )
    of
        % {ok, #sslsocket{fd = {_, Socket, _, _}, pid = [Receiver | _]}} ->
        {ok, {sslsocket, {_, Socket, _, _}, [Receiver | _]}} ->
            set_ktls(Socket, Receiver, client, self()),
            HSData = #hs_data{
                kernel_pid = Kernel,
                other_node = Node,
                this_node = MyNode,
                socket = Socket,
                timer = Timer,
                this_flags = 0,
                other_version = Version,
                f_send = fun Driver:send/2,
                f_recv = fun Driver:recv/3,
                f_setopts_pre_nodeup =
                    fun(S) ->
                        inet:setopts(
                            S,
                            [
                                {active, false},
                                {packet, 4},
                                nodelay()
                            ]
                        )
                    end,
                f_setopts_post_nodeup =
                    fun(S) ->
                        inet:setopts(
                            S,
                            [
                                {active, true},
                                {deliver, port},
                                {packet, 4},
                                nodelay()
                            ]
                        )
                    end,

                f_getll = fun inet:getll/1,
                f_address =
                    fun(_, _) ->
                        #net_address{
                            address = {Ip, TcpPort},
                            host = Address,
                            protocol = tcp,
                            family = AddressFamily
                        }
                    end,
                mf_tick = fun(S) -> inet_tcp_dist:tick(Driver, S) end,
                mf_getstat = fun inet_tcp_dist:getstat/1,
                request_type = Type,
                mf_setopts = fun inet_tcp_dist:setopts/2,
                mf_getopts = fun inet_tcp_dist:getopts/2
            },
            dist_util:handshake_we_started(HSData);
        _ ->
            %% Other Node may have closed since
            %% discovery !
            ?trace(
                "other node (~p) "
                "closed since discovery (port_please).~n",
                [Node]
            ),
            ?shutdown(Node)
    end.

connect_options(Opts) ->
    case application:get_env(kernel, inet_dist_connect_options) of
        {ok, ConnectOpts} ->
            ConnectOpts ++ Opts;
        _ ->
            Opts
    end.

%% If Node is illegal terminate the connection setup!!
splitnode(Driver, Node, LongOrShortNames) ->
    case split_node(atom_to_list(Node), $@, []) of
        [Name | Tail] when Tail =/= [] ->
            Host = lists:append(Tail),
            case split_node(Host, $., []) of
                [_] when LongOrShortNames =:= longnames ->
                    case Driver:parse_address(Host) of
                        {ok, _} ->
                            [Name, Host];
                        _ ->
                            error_msg(
                                "** System running to use "
                                "fully qualified "
                                "hostnames **~n"
                                "** Hostname ~ts is illegal **~n",
                                [Host]
                            ),
                            ?shutdown(Node)
                    end;
                L when length(L) > 1, LongOrShortNames =:= shortnames ->
                    error_msg(
                        "** System NOT running to use fully qualified "
                        "hostnames **~n"
                        "** Hostname ~ts is illegal **~n",
                        [Host]
                    ),
                    ?shutdown(Node);
                _ ->
                    [Name, Host]
            end;
        [_] ->
            error_msg(
                "** Nodename ~p illegal, no '@' character **~n",
                [Node]
            ),
            ?shutdown(Node);
        _ ->
            error_msg("** Nodename ~p illegal **~n", [Node]),
            ?shutdown(Node)
    end.

split_node([Chr | T], Chr, Ack) -> [lists:reverse(Ack) | split_node(T, Chr, [])];
split_node([H | T], Chr, Ack) -> split_node(T, Chr, [H | Ack]);
split_node([], _, Ack) -> [lists:reverse(Ack)].

%% ------------------------------------------------------------
%% Determine if EPMD module supports the called functions.
%% If not call the builtin erl_epmd
%% ------------------------------------------------------------
call_epmd_function(Mod, Fun, Args) ->
    case erlang:function_exported(Mod, Fun, length(Args)) of
        true -> apply(Mod, Fun, Args);
        _ -> apply(erl_epmd, Fun, Args)
    end.

%% ------------------------------------------------------------
%% Do only accept new connection attempts from nodes at our
%% own LAN, if the check_ip environment parameter is true.
%% ------------------------------------------------------------
check_ip(Driver, Socket) ->
    case application:get_env(check_ip) of
        {ok, true} ->
            case get_ifs(Socket) of
                {ok, IFs, IP} ->
                    check_ip(Driver, IFs, IP);
                _ ->
                    ?shutdown(no_node)
            end;
        _ ->
            true
    end.

get_ifs(Socket) ->
    case inet:peername(Socket) of
        {ok, {IP, _}} ->
            case inet:getif(Socket) of
                {ok, IFs} -> {ok, IFs, IP};
                Error -> Error
            end;
        Error ->
            Error
    end.

check_ip(Driver, [{OwnIP, _, Netmask} | IFs], PeerIP) ->
    case {Driver:mask(Netmask, PeerIP), Driver:mask(Netmask, OwnIP)} of
        {M, M} -> true;
        _ -> check_ip(Driver, IFs, PeerIP)
    end;
check_ip(_Driver, [], PeerIP) ->
    {false, PeerIP}.

set_ktls(Socket, Receiver, Type, ControlPid) ->
    State = sys:replace_state(
        Receiver,
        fun({_, State0}) ->
            gen_tcp:controlling_process(Socket, ControlPid),
            {downgrade, State0}
        end
    ),
    inet:setopts(Socket, [list, {active, false}]),
    gen_statem:stop(Receiver, {shutdown, downgrade}, net_kernel:connecttime()),
    % {_, #state{connection_states = ConnectionStates}} = State,
    ConnectionStates = element(12, element(2, State)),
    CurrentWrite = maps:get(current_write, ConnectionStates),
    CurrentRead = maps:get(current_read, ConnectionStates),
    % #cipher_state{iv = <<WriteSalt:4/bytes, WriteIV:8/bytes>>, key = WriteKey} = maps:get(cipher_state, CurrentWrite),
    % #cipher_state{iv = <<ReadSalt:4/bytes, ReadIV:8/bytes>>, key = ReadKey} = maps:get(cipher_state, CurrentRead),
    {cipher_state, <<WriteSalt:4/bytes, WriteIV:8/bytes>>, WriteKey, _, _, _, _} = maps:get(cipher_state, CurrentWrite),
    {cipher_state, <<ReadSalt:4/bytes, ReadIV:8/bytes>>, ReadKey, _, _, _, _} = maps:get(cipher_state, CurrentRead),
    WriteSeq = maps:get(sequence_number, CurrentWrite),
    ReadSeq = maps:get(sequence_number, CurrentRead),
    % SOL_TCP = 6, TCP_ULP = 31
    inet:setopts(Socket, [{raw, 6, 31, <<"tls">>}]),
    % SOL_TLS = 282, TLS_TX = 1, TLS_RX = 2, TLS_1_3_VERSION = <<4, 3>>, TLS_CIPHER_AES_GCM_256 = <<52, 0>>
    inet:setopts(Socket, [
        {raw, 282, 1, <<4, 3, 52, 0, WriteIV/binary, WriteKey/binary, WriteSalt/binary, WriteSeq:64>>}
    ]),
    inet:setopts(Socket, [{raw, 282, 2, <<4, 3, 52, 0, ReadIV/binary, ReadKey/binary, ReadSalt/binary, ReadSeq:64>>}]),
    % to avoid client send first application data together with the handshake message
    Type =:= client andalso timer:sleep(1000).

server_ssl_opts() ->
    [
        {cacerts, [
            <<48,130,1,246,48,130,1,88,160,3,2,1,2,2,20,35,176,232,217,205,116,137,193,134,85,221,173,145,223,241,57,
            167,13,43,162,48,10,6,8,42,134,72,206,61,4,3,2,48,13,49,11,48,9,6,3,85,4,6,19,2,85,83,48,30,23,13,50,50,48,
            54,49,51,50,49,53,55,52,57,90,23,13,51,50,48,54,49,48,50,49,53,55,52,57,90,48,13,49,11,48,9,6,3,85,4,6,19,
            2,85,83,48,129,155,48,16,6,7,42,134,72,206,61,2,1,6,5,43,129,4,0,35,3,129,134,0,4,1,72,179,230,5,247,187,
            162,102,227,141,206,96,172,224,133,99,249,233,75,90,38,117,155,182,105,120,218,175,246,134,120,59,177,232,
            65,204,60,174,85,202,193,51,82,117,100,62,52,5,89,71,224,4,145,252,138,50,81,240,238,174,92,102,33,218,171,
            1,44,153,106,246,160,122,221,38,46,91,105,37,137,177,225,39,47,65,127,188,23,242,55,177,133,254,181,171,46,
            1,227,209,34,142,251,23,150,121,235,73,62,184,119,177,245,247,243,148,91,235,5,245,211,53,22,83,147,154,
            163,20,176,228,106,79,72,163,83,48,81,48,29,6,3,85,29,14,4,22,4,20,236,171,4,136,198,42,236,8,175,15,253,
            232,121,208,123,126,65,105,149,16,48,31,6,3,85,29,35,4,24,48,22,128,20,236,171,4,136,198,42,236,8,175,15,
            253,232,121,208,123,126,65,105,149,16,48,15,6,3,85,29,19,1,1,255,4,5,48,3,1,1,255,48,10,6,8,42,134,72,206,
            61,4,3,2,3,129,139,0,48,129,135,2,65,21,185,151,163,89,93,69,139,18,152,136,163,58,119,199,218,164,156,223,
            15,94,252,63,124,34,59,10,0,173,115,179,246,241,227,9,158,224,56,140,77,145,242,20,71,140,207,156,177,36,
            199,148,227,147,158,130,139,80,40,53,207,197,70,153,98,139,2,66,1,90,249,95,52,12,61,226,84,128,90,57,212,
            65,34,15,76,54,137,22,1,168,136,68,234,63,170,64,135,2,34,224,0,255,163,151,41,219,8,166,151,178,212,157,
            12,255,60,227,195,46,131,100,207,236,152,48,250,181,31,192,182,211,37,171,157,96>>
        ]},
        {key, {'ECPrivateKey',
            <<48,129,220,2,1,1,4,66,1,58,230,70,152,77,81,98,49,169,174,157,115,219,124,163,44,167,132,41,92,185,252,
            114,133,27,111,7,18,2,87,201,6,204,107,255,205,24,14,194,196,111,252,224,76,159,37,73,233,221,72,91,58,71,
            31,143,105,137,137,47,10,126,217,136,198,239,160,7,6,5,43,129,4,0,35,161,129,137,3,129,134,0,4,1,72,179,
            230,5,247,187,162,102,227,141,206,96,172,224,133,99,249,233,75,90,38,117,155,182,105,120,218,175,246,134,
            120,59,177,232,65,204,60,174,85,202,193,51,82,117,100,62,52,5,89,71,224,4,145,252,138,50,81,240,238,174,92,
            102,33,218,171,1,44,153,106,246,160,122,221,38,46,91,105,37,137,177,225,39,47,65,127,188,23,242,55,177,133,
            254,181,171,46,1,227,209,34,142,251,23,150,121,235,73,62,184,119,177,245,247,243,148,91,235,5,245,211,53,
            22,83,147,154,163,20,176,228,106,79,72>>
        }},
        {cert,
            <<48,130,1,143,48,129,242,2,1,42,48,10,6,8,42,134,72,206,61,4,3,2,48,13,49,11,48,9,6,3,85,4,6,19,2,85,83,
            48,30,23,13,50,50,48,54,49,51,50,50,48,49,49,57,90,23,13,51,50,48,54,49,48,50,50,48,49,49,57,90,48,20,49,
            18,48,16,6,3,85,4,3,12,9,108,111,99,97,108,104,111,115,116,48,129,155,48,16,6,7,42,134,72,206,61,2,1,6,5,
            43,129,4,0,35,3,129,134,0,4,1,72,179,230,5,247,187,162,102,227,141,206,96,172,224,133,99,249,233,75,90,38,
            117,155,182,105,120,218,175,246,134,120,59,177,232,65,204,60,174,85,202,193,51,82,117,100,62,52,5,89,71,
            224,4,145,252,138,50,81,240,238,174,92,102,33,218,171,1,44,153,106,246,160,122,221,38,46,91,105,37,137,177,
            225,39,47,65,127,188,23,242,55,177,133,254,181,171,46,1,227,209,34,142,251,23,150,121,235,73,62,184,119,
            177,245,247,243,148,91,235,5,245,211,53,22,83,147,154,163,20,176,228,106,79,72,48,10,6,8,42,134,72,206,61,
            4,3,2,3,129,139,0,48,129,135,2,66,1,64,128,20,76,22,22,45,195,145,84,166,166,4,237,226,141,91,116,185,201,
            141,107,198,95,102,132,243,81,78,201,223,196,240,167,238,209,35,125,253,164,96,240,8,180,160,194,242,195,
            134,112,254,171,168,114,109,65,189,65,66,206,68,97,88,192,214,2,65,80,226,31,141,212,3,100,124,119,213,66,
            215,11,220,245,78,179,102,225,222,155,75,91,9,121,141,188,105,199,151,165,124,207,217,31,227,153,6,210,184,
            169,214,234,109,158,20,174,38,220,169,91,55,46,143,10,45,208,30,149,185,47,40,178,168,50>>
        },
        {verify, verify_peer},
        {versions, ['tlsv1.3']},
        {ciphers, [#{cipher => aes_256_gcm, key_exchange => any, mac => aead, prf => sha384}]}
    ].

client_ssl_opts() ->
    [{customize_hostname_check, [{match_fun, fun(_, _) -> true end}]} | server_ssl_opts()].

zzydxm avatar Jun 14 '22 00:06 zzydxm

@essen we like to concentrate on the distribution case first and have that complete with minimal changes to the general ssl functionality. If the distribution over kTLS is successful we might later on add support for it in the general ssl API.

KennethL avatar Jun 14 '22 06:06 KennethL

Have not looked at the example module yet. But about early data I think it does not have to be a problem for the distribution case as then we have control over the clients and as long as they do not send any early data there will be none.

Also, a thought might be that the close down of the erlang processes, as an alternative to a special option to ssl:controlling_process could be a special variant of ssl:close. I am not sure which I would prefer just a thought for the moment, but it could be seen as a different kind of downgrade that has other requirements than the normal downgrade.

IngelaAndin avatar Jun 14 '22 07:06 IngelaAndin

But about early data I think it does not have to be a problem for the distribution case as then we have control over the clients and as long as they do not send any early data there will be none.

I tried but haven't find a way. The current protocol is that Client Hello -> Service response + set_ktls -> Client handshake data + set_ktls + dist_util:handshake_we_started, which handshake_we_started can be sent together with handshake data if no timer:sleep on client side. I considered server side sending some signal indicating ktls is set, but whoever sent the first message after handshake, the message will meet the early data issue.

Set {packet, ssl_tls} with {active, 1} does work, but need 3 changes in ssl application:

  1. handle ssl_tls data message
  2. be able to set {active, 1} (currently it is default 100 for dist)
  3. do not activate socket after ssl_gen_statem:prepare_connection and wait for downgrade Would it be ok?

We did have some other changes for ktls_dist to run at the same time with inet6_tls_dist, but that should be a separate discussion in the proto_dist PR

zzydxm avatar Jun 14 '22 16:06 zzydxm

I think there is an one-line change in ssl application that can make ktls_dist work without timer:sleep It is to set {active, false} in client tls_handshake_1_3:do_wait_finished Then, in server ktls_dist:set_ktls we send a message to client, and in client ktls_dist:set_ktls we receive the message.

zzydxm avatar Jun 15 '22 01:06 zzydxm

But about early data I think it does not have to be a problem for the distribution case as then we have control over the clients and as long as they do not send any early data there will be none.

I tried but haven't find a way. The current protocol is that Client Hello -> Service response + set_ktls -> Client handshake data + set_ktls + dist_util:handshake_we_started, which handshake_we_started can be sent together with handshake data if no timer:sleep on client side.

Well, we are not talking about the same thing I think. I was talking about what is called early data in TLS-1.3, that is 0-RTT data, that can be sent when using a session ticket. You are talking about the first real application data which is the erlang distribution handshake.

IngelaAndin avatar Jun 15 '22 06:06 IngelaAndin

Set {packet, ssl_tls} with {active, 1} does work, but need 3 changes in ssl application:

  1. handle ssl_tls data message

I think this should be an ok change, even if it would be a distribution clause for now. We probably should assert that, by matching the erl_dist option. It might be considered to use even in normal mode later. The packet was introduced for the ssl application but it was not used as it was not deemed efficient (although in hindsight that might actually be that it was inet:setopts(active once), that was the bottleneck), that is why we have an internal active n now. Probably not so crucial for the handshake but a problem for application data throughput. And also {active, 1} is different from calling setopts with active once on an implementation level.

  1. be able to set {active, 1} (currently it is default 100 for dist)

I think it should be handled by this call in tls_connection. Possibly with added parameter. ssl_config:get_internal_active_n(IsErlDist), We can think of it should be generally configurable for distribution module x.

  1. do not activate socket after ssl_gen_statem:prepare_connection and wait for downgrade Would it be ok?

I think that if the user of the ssl socket is in passive mode and kept in passive mode until the socket ownership is transferred, it could be possible to add a function clause tls_gen_connection: flow_ctrl function to manage this.

We did have some other changes for ktls_dist to run at the same time with inet6_tls_dist, but that should be a separate discussion in the proto_dist PR

Sounds good to have a separate discussion.

IngelaAndin avatar Jun 15 '22 07:06 IngelaAndin

I did a new commit which simplified a lot in ssl application: https://github.com/zzydxm/otp/commit/018285e0184be9a5337fa2f41d95de91c3c0f82c

to use it:

terminal 1:

erl -pa ./ -proto_dist ktls -sname a@localhost
(a@localhost)1> ktls_dist:r().
true

terminal 2:

erl -pa ./ -proto_dist ktls -sname b@localhost
(b@localhost)1> ktls_dist:r().
true
(b@localhost)2> net_kernel:connect_node('a@localhost').
true

r/0, s/1, c/1, client_ssl_opts/0, server_ssl_opts/0 functions will be removed in actual PR

I have one question to the change: the only difference between ktls_dist and inet_tls_dist are: a. option {erl_dist_module, ?MODULE} b. ssl:controlling_process => gen_tcp:controlling_process c. ktls_dist:set_ktls(State) function d. HSData Can I put all the logic in inet_tls_dist instead of a new module? That would save a lot of copy-pasting

zzydxm avatar Jun 15 '22 23:06 zzydxm

A different approach with patch in inet_tls_dist instead of a separate ktls_dist module: https://github.com/zzydxm/otp/commit/ab34555210a994083de8f7916e9c0605728f7554

zzydxm avatar Jun 16 '22 22:06 zzydxm

Hi all! I have started to read up on all this...

It seems I need to reinstall my Corporate laptop with Ubutu 20 to be able to see Kernel TLS in real life, and not just what I can find on the Internet.

I think that set_ktls/? should as a "final solution" be implemented in SSL as a dedicated operation that returns the encryption state, transfers ownership, and shuts down the server. Possibly named something like ssl:offload_ktls(Server).

Moving the logic in ktls_dist into inet_tls_dist sounds like a good idea.

I think with the current suggestion that the erlang code knows way to much about the byte/word ordering and field alignment in the TLS_TX and TLS_RX options. If these were to be implemented in socket I think (as far as I have understood it) the options should be: {tcp,ulp} with value <<"tls">>, and {tls,tx} and {tls,rx} with value

#{version := {1,3},
  cipher := aes_gcm_128|aes_gcm_256|aes_ccm_128|chacha20_poly1305|sm4_gcm|sm4_ccm,
  iv := binary(),
  key := binary(),
  salt := binary(),
  rec_seq := binary()}

The ssl:offload_ktls/1 function could return the encryption state on the same format. I do not know if there are cipher names in SSL that corresponds to these cipher names, or if we could align them to facilitate setting handshake cipher.

Regarding what alternative there is apart from using inet_drv; it is to use distribution processes instead of a distribution port, as inet_tls_dist works in contrast to inet_tcp_dist. Rescheduling should be the same if the distribution process uses socket.

With a distribution port, when a process bangs to a remote node, the VM encodes and enqueues the data then schedules a driver send operation. The driver send operation then writes data to the socket.

With distribution processes, the VM also encodes and enqueues a banged message then sends a dist_data message to the distribution controller which schedules that process. The distribution controller calls erlang:dist_ctrl_get_data/1 to get the data and then calls e.g the socket NIF that writes the data to the socket.

Therefore I do not know if using a distribution process has got noticeably larger overhead than using a distribution port, but distribution processes are intended to replace or work as well as distribution ports... See kernel/examples, gen_tcp_dist.erl. We would prefer to add a big new feature like this to socket instead of to inet_drv.

I can see that since SSL will for now use the inet_backend = socket compatibility mode we would need to add a function to extract the socket handle from the compatibility handle, perhaps in gen_tcp_socket, or in inet. Then the kTLS socket options can be set via socket:setopt/3 before entering the dist controller send and receive loops.

That said; I think the most important to change is the option formats, and that can be done in in inet_drv but probably fits better in socket. The SSL application should be extended to implement the offload operation. To figure out if we can use distribution processes with socket, unfortunately, some rewrites are needed, but it would be interesting to know if the overhead is noticable. I would suspect that the overhead would be buried in the encryption work in any case.

RaimoNiskanen avatar Jun 17 '22 14:06 RaimoNiskanen

I think the support of process dist control is relative straight forward. Almost all operations can be easily migrated to use socket nif + process dist control, just with a lot of copy-pasting on gen_tcp_dist.erl.

The only thing I don't know is that does socket nif support {packet, ssl_tls} mode with {active 1}? It is some essential feature we need for having a stable state to transfer ownership.

I'm a bit confused on what ssl:offload_ktls(Server) means in your comment. Do you mean ssl:offload_ktls(#sslsocket{})? It would be much easier if we set ktls immediately after handshake, so that we don't need to handle racing issues between sender and receiver.

I would still imagine that process based dist control to have some regression compare to port based dist control. I don't have a test on it yet. However, if port based dist control can communicate with process based dist control. I can try to test it in our cluster. It would be great if the port based version can appear in OTP first.

zzydxm avatar Jun 17 '22 17:06 zzydxm

@zzydxm wrote:

The only thing I don't know is that does socket nif support {packet, ssl_tls} mode with {active 1}? It is some essential feature we need for having a stable state to transfer ownership.

The compatibility module gen_tcp_socket should implement {active,N}.

I'm a bit confused on what ssl:offload_ktls(Server) means in your comment. Do you mean ssl:offload_ktls(#sslsocket{})? It would be much easier if we set ktls immediately after handshake, so that we don't need to handle racing issues between sender and receiver.

I meant that the functionality in ktls_basic_SUITE:set_ktls_and_send_recv_result(#ssl_socket{}) (except sending "Hello, world") should be implemented as one (undocumented) operation in the SSL application instead of messing with the internal #ssl_socket{} record and using sys:replace_state(Receiver, _) to set controlling process. That operation could perhaps return the option values to be used for offloading encryption to kTLS.

I would still imagine that process based dist control to have some regression compare to port based dist control. I don't have a test on it yet. However, if port based dist control can communicate with process based dist control. I can try to test it in our cluster. It would be great if the port based version can appear in OTP first.

We would like to avoid to (in OTP) implement setting these large options in a stringent way both for inet_drv.c and for socket, and would strongly prefer socket. So if socket, implying dist ctrl process, is fast "enough" that would be awesome.

I guess it would be possible to use raw options for both cases and that way compare dist ctrl port with dist ctrl process... :-)

RaimoNiskanen avatar Jun 20 '22 15:06 RaimoNiskanen

The patch of the inet_tls_dist module https://github.com/zzydxm/otp/commit/ab34555210a994083de8f7916e9c0605728f7554 has got a nice size and complexity. But any module in the SSL application should be prefixed with ssl_, and also, I do not think a dedicated module ktls is needed - the code should be in inet_tls_dist or rather the values should come from the yet non-existing operation in the SSL application so inet_tls_dist:set_ktls/1 should boil down to calling that operation, and setting the encryption offload parameters from that operation on the socket.

Does this make sense?

RaimoNiskanen avatar Jun 20 '22 16:06 RaimoNiskanen

The patch of the inet_tls_dist module zzydxm@ab34555 has got a nice size and complexity. But any module in the SSL application should be prefixed with ssl_, and also, I do not think a dedicated module ktls is needed - the code should be in inet_tls_dist or rather the values should come from the yet non-existing operation in the SSL application so inet_tls_dist:set_ktls/1 should boil down to calling that operation, and setting the encryption offload parameters from that operation on the socket.

Does this make sense?

So basically, after TLS handshake, ssl_gen_statem will go to a state 'wait_shutdown' which is waiting for a gen_statem call to fetch the cipher state and shutdown the process. (It need to be a new state, because we don't want the socket to be activated) Then inet_tls_dist:set_ktls will make this gen_statem call and change socket options based on the return. Is my understanding correct? I will work on it.

zzydxm avatar Jun 20 '22 16:06 zzydxm

updated https://github.com/zzydxm/otp/commit/eec6f9a7ea901daa344ff5d799b6cac847ab6ddf which added ssl_gen_statem:ktls_handover and moves set_ktls into inet_tls_dist

zzydxm avatar Jun 21 '22 03:06 zzydxm

I think that sounds like a good way. I do not know if @IngelaAndin has got other ideas. It is digging in internal SSL data structures I think best should be done from within the SSL application...

RaimoNiskanen avatar Jun 21 '22 07:06 RaimoNiskanen

The patch of the inet_tls_dist module zzydxm@ab34555 has got a nice size and complexity. But any module in the SSL application should be prefixed with ssl_,

Actually, inet_tls_dist must be prefixed inet according to requirements on distribution callbacks modules. Also the ssl application modules are already prefixed ssl_ tls_ or dtls_ ! So should is maybe saying too much ;)

IngelaAndin avatar Jun 22 '22 10:06 IngelaAndin

The patch of the inet_tls_dist module zzydxm@ab34555 has got a nice size and complexity. But any module in the SSL application should be prefixed with ssl_, and also, I do not think a dedicated module ktls is needed - the code should be in inet_tls_dist or rather the values should come from the yet non-existing operation in the SSL application so inet_tls_dist:set_ktls/1 should boil down to calling that operation, and setting the encryption offload parameters from that operation on the socket. Does this make sense?

So basically, after TLS handshake, ssl_gen_statem will go to a state 'wait_shutdown' which is waiting for a gen_statem call to fetch the cipher state and shutdown the process. (It need to be a new state, because we don't want the socket to be activated) Then inet_tls_dist:set_ktls will make this gen_statem call and change socket options based on the return. Is my understanding correct? I will work on it.

I think we can have a new state for it, similar to the downgrade state but for handling "ktls_offlad". I am not sure about what would be the best name. The gen_statem call for doing this can, for now, be internal to the ssl application as long as we think that this is a distribution feature. I think however that should be made a function in ssl_gen_statem that implements the call!

IngelaAndin avatar Jun 22 '22 10:06 IngelaAndin

I think we can have a new state for it, similar to the downgrade state but for handling "ktls_offlad". I am not sure about what would be the best name. The gen_statem call for doing this can, for now, be internal to the ssl application as long as we think that this is a distribution feature. I think however that should be made a function in ssl_gen_statem that implements the call!

I opened a new PR https://github.com/erlang/otp/pull/6104 with most logic in inet_tls_dist, and wait for offload request in "connection" state. Would you like to take a look? Thanks!

zzydxm avatar Jun 22 '22 16:06 zzydxm

Closed in favour of your PR #6104

RaimoNiskanen avatar Sep 29 '22 15:09 RaimoNiskanen