nebula Performance issue on LAN

Hello,

I am building an hybrid network, some host are on the same local network ( 192.168.0.0/24 )and some other host are on the internet.

I have tried tweaking tx_queue to 5000 and read_buffer & write_buffer to 20000000 , which greatly improve performance.

However, between 2 hosts on the same lan iperf test on nebula ip reach +-200 Mbits/sec while same test using local ip max at 938 Mbits/sec ( which makes sens for a gigabyte link ).

Any idea of what could be done to improve performance between same network hosts ?

Regards,

Sep 25 '21 11:09 adi90x

Can you also share the hardware on the nodes and the iperf commands you used? Also ip addr output and nebula config might help as well.

Sep 27 '21 11:09 ieugen

Hello,

Host1 is small server and Host2 is a virtual machine on a proxmox host. Issue is even worst if trying from 2 VM ( i.e : 15Gb using local ip vs 600Mb using Nebula IP ) As discussed on Slack, I have try setting routines at 2,5,10,100 with no luck, I have also try preferred_ranges to make sure Nebula was using the local ip, with no succes for the moment.

Any other idea ?

Host 1 :

Reporting only the part that make sense

2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 84:39:be:9c:03:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.25/24 brd 192.168.0.255 scope global dynamic enp1s0
       valid_lft 75593sec preferred_lft 75593sec
    inet6 fe80::8639:beff:fe9c:3c0/64 scope link 
       valid_lft forever preferred_lft forever
(.....)
158: neb0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8000 qdisc fq_codel state UNKNOWN group default qlen 50000
    link/none 
    inet 10.6.0.3/16 scope global neb0
       valid_lft forever preferred_lft forever

Nebula config of host 1 :

# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)

# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: /opt/nebula/ca.crt
  cert: /opt/nebula/host1.fr.crt
  key: /opt/nebula/host1.fr.key
  #blocklist is a list of certificate fingerprints that we will refuse to talk to
  #blocklist:
  #  - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72

# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
#   "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
  "10.6.0.1": ["xxxxxxx:4242"]
  "10.6.0.2": ["xxxxxx:4242"]
  "10.6.0.3": ["host1:4242"]


lighthouse:
  # am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
  # you have configured to be lighthouses in your network
  am_lighthouse: true
  # serve_dns optionally starts a dns listener that responds to various queries and can even be
  # delegated to for resolution
  #serve_dns: false
  #dns:
    # The DNS host defines the IP to bind the dns listener to. This also allows binding to the nebula node IP.
    #host: 0.0.0.0
    #port: 53
  # interval is the number of seconds between updates from this node to a lighthouse.
  # during updates, a node sends information about its current IP addresses to each node.
  interval: 60
  # hosts is a list of lighthouse hosts this node should report to and query from
  # IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
  # IMPORTANT2: THIS SHOULD BE LIGHTHOUSES' NEBULA IPs, NOT LIGHTHOUSES' REAL ROUTABLE IPs
  #hosts:
  #  - "192.168.100.1"

  # remote_allow_list allows you to control ip ranges that this node will
  # consider when handshaking to another node. By default, any remote IPs are
  # allowed. You can provide CIDRs here with `true` to allow and `false` to
  # deny. The most specific CIDR rule applies to each remote. If all rules are
  # "allow", the default will be "deny", and vice-versa. If both "allow" and
  # "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the
  # default.
  #remote_allow_list:
    # Example to block IPs from this subnet from being used for remote IPs.
    #"172.16.0.0/12": false

    # A more complicated example, allow public IPs but only private IPs from a specific subnet
    #"0.0.0.0/0": true
    #"10.0.0.0/8": false
    #"10.42.42.0/24": true

  # local_allow_list allows you to filter which local IP addresses we advertise
  # to the lighthouses. This uses the same logic as `remote_allow_list`, but
  # additionally, you can specify an `interfaces` map of regular expressions
  # to match against interface names. The regexp must match the entire name.
  # All interface rules must be either true or false (and the default will be
  # the inverse). CIDR rules are matched after interface name rules.
  # Default is all local IP addresses.
  #local_allow_list:
    # Example to block tun0 and all docker interfaces.
    #interfaces:
      #tun0: false
      #'docker.*': false
    # Example to only advertise this subnet to the lighthouse.
    #"10.0.0.0/8": true

# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
  # To listen on both any ipv4 and ipv6 use "[::]"
  host: 0.0.0.0
  port: 4242
  # Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
  # default is 64, does not support reload
  #batch: 64
  # Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
  # Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
  # Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
  # max, net.core.rmem_max and net.core.wmem_max
  read_buffer: 200000000
  write_buffer: 200000000

preferred_ranges:
    192.168.0.0/16

# EXPERIMENTAL: This option is currently only supported on linux and may
# change in future minor releases.
#
# Routines is the number of thread pairs to run that consume from the tun and UDP queues.
# Currently, this defaults to 1 which means we have 1 tun queue reader and 1
# UDP queue reader. Setting this above one will set IFF_MULTI_QUEUE on the tun
# device and SO_REUSEPORT on the UDP socket to allow multiple queues.
routines: 2

punchy:
  # Continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
  punch: true

  # respond means that a node you are trying to reach will connect back out to you if your hole punching fails
  # this is extremely useful if one node is behind a difficult nat, such as a symmetric NAT
  # Default is false
  respond: true

  # delays a punch response for misbehaving NATs, default is 1 second, respond must be true to take effect
  delay: 1s

# Cipher allows you to choose between the available ciphers for your network. Options are chachapoly or aes
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly

# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
local_range: "192.168.0.0/24"

# sshd can expose informational and administrative functions via ssh this is a
sshd:
  # Toggles the feature
  enabled: true
  # Host and port to listen on, port 22 is not allowed for your safety
  listen: 0.0.0.0:2222
  # A file containing the ssh host private key to use
  # A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
  host_key: /opt/nebula/ssh_host_ed25519_key
  # A file containing a list of authorized public keys
  authorized_users:
    - user: adrienm
      # keys can be an array of strings or single string
      keys:
        - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDVA9xmt3mSVv1r/bNm/W4L1ql2VvL+eRFianyPPFEFUO98aOaH/tCr8ScAfKm9eDJ2lDfsJFsyfV+4e4DnsyoG3VOERgwv4QpCQRkMy0irjIO/XWG3s+aSPDM5y1iRAv0E9w6dHE8UXZLRdlF4W75MifctFtke2+uJeBlDHv7Ur+DezHpZPw2oJBbAj9ScYlsx93Q4C0PljiPgbx02fD3QwuqbILBZMsCUL4VUSZne9q0nKTSuJAPs7lDDvXJhXB5EMLSxHoO7L5rQ1tO2QuCXXwrBZzv507Fgn5jp9M62cXJDgmNuBJ+sXNIUbABRd1OAnaHTWXW0tFvYqRmDzp4FWVFjDnr1mJ53OxcMySTfPzxsr0OgiwJH6Bqx3jRtR2ENWfgYmfu8WSIN5pQgvcdU5Pcoi7M0Cf+7kMJjUTi2ag9yKahU6H1lJ72QZyylaSTd/RNK/CKFy/qsfJ/VAWlbaQZJuhTsp6FWXMq6YkgAvhP23mi682jJXMAUIfEzW6k= adrienm@swift"

# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
  # When tun is disabled, a lighthouse can be started without a local tun interface (and therefore without root)
  disabled: false
  # Name of the device
  dev: neb0
  # Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
  drop_local_broadcast: false
  # Toggles forwarding of multicast packets
  drop_multicast: false
  # Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
  tx_queue: 5000
  # Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
  mtu: 8600
  # Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
  routes:
    #- mtu: 8800
    #  route: 10.0.0.0/16
  # Unsafe routes allows you to route traffic over nebula to non-nebula nodes
  # Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
  # NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
  unsafe_routes:
    #- route: 172.16.1.0/24
    #  via: 192.168.100.99
    #  mtu: 1300 #mtu will default to tun mtu if this option is not sepcified


# TODO
# Configure logging level
logging:
  # panic, fatal, error, warning, info, or debug. Default is info
  level: info
  # json or text formats currently available. Default is text
  format: text
  # Disable timestamp logging. useful when output is redirected to logging system that already adds timestamps. Default is false
  #disable_timestamp: true
  # timestamp format is specified in Go time format, see:
  #     https://golang.org/pkg/time/#pkg-constants
  # default when `format: json`: "2006-01-02T15:04:05Z07:00" (RFC3339)
  # default when `format: text`:
  #     when TTY attached: seconds since beginning of execution
  #     otherwise: "2006-01-02T15:04:05Z07:00" (RFC3339)
  # As an example, to log as RFC3339 with millisecond precision, set to:
  #timestamp_format: "2006-01-02T15:04:05.000Z07:00"

#stats:
  #type: graphite
  #prefix: nebula
  #protocol: tcp
  #host: 127.0.0.1:9999
  #interval: 10s

  #type: prometheus
  #listen: 127.0.0.1:8080
  #path: /metrics
  #namespace: prometheusns
  #subsystem: nebula
  #interval: 10s

  # enables counter metrics for meta packets
  #   e.g.: `messages.tx.handshake`
  # NOTE: `message.{tx,rx}.recv_error` is always emitted
  #message_metrics: false

  # enables detailed counter metrics for lighthouse packets
  #   e.g.: `lighthouse.rx.HostQuery`
  #lighthouse_metrics: false

# Handshake Manager Settings
#handshakes:
  # Handshakes are sent to all known addresses at each interval with a linear backoff,
  # Wait try_interval after the 1st attempt, 2 * try_interval after the 2nd, etc, until the handshake is older than timeout
  # A 100ms interval with the default 10 retries will give a handshake 5.5 seconds to resolve before timing out
  #try_interval: 100ms
  #retries: 20
  # trigger_buffer is the size of the buffer channel for quickly sending handshakes
  # after receiving the response for lighthouse queries
  #trigger_buffer: 64


# Nebula security group configuration
firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  # The firewall is default deny. There is no way to write a deny rule.
  # Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
  # Logical evaluation is roughly: port AND proto AND (ca_sha OR ca_name) AND (host OR group OR groups OR cidr)
  # - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
  #   code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
  #   proto: `any`, `tcp`, `udp`, or `icmp`
  #   host: `any` or a literal hostname, ie `test-host`
  #   group: `any` or a literal group name, ie `default-group`
  #   groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
  #   cidr: a CIDR, `0.0.0.0/0` is any.
  #   ca_name: An issuing CA name
  #   ca_sha: An issuing CA shasum

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any

For host2 :

ip addr of host2 :

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether a2:cd:f6:57:be:86 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.26/24 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a0cd:f6ff:fe57:be86/64 scope link 
       valid_lft forever preferred_lft forever

(.....)

216: neb0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8600 qdisc fq_codel state UNKNOWN group default qlen 5000
    link/none 
    inet 10.6.0.4/16 scope global neb0
       valid_lft forever preferred_lft forever

and nebula config :

# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)

# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: /opt/nebula/ca.crt
  cert: /opt/nebula/host2.crt
  key: /opt/nebula/host2.key
  #blocklist is a list of certificate fingerprints that we will refuse to talk to
  #blocklist:
  #  - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72

# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
#   "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
  "10.6.0.1": ["xxxxxx:4242"]
  "10.6.0.2": ["xxxxxx:4242"]
  "10.6.0.3": ["host1:4242"]

preferred_ranges:
    192.168.0.0/16

lighthouse:
  # am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
  # you have configured to be lighthouses in your network
  am_lighthouse: false
  # serve_dns optionally starts a dns listener that responds to various queries and can even be
  # delegated to for resolution
  #serve_dns: false
  #dns:
    # The DNS host defines the IP to bind the dns listener to. This also allows binding to the nebula node IP.
    #host: 0.0.0.0
    #port: 53
  # interval is the number of seconds between updates from this node to a lighthouse.
  # during updates, a node sends information about its current IP addresses to each node.
  interval: 60
  # hosts is a list of lighthouse hosts this node should report to and query from
  # IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
  # IMPORTANT2: THIS SHOULD BE LIGHTHOUSES' NEBULA IPs, NOT LIGHTHOUSES' REAL ROUTABLE IPs
  hosts:
        - "10.6.0.1"
        - "10.6.0.2"
        - "10.6.0.3"
    
  # remote_allow_list allows you to control ip ranges that this node will
  # consider when handshaking to another node. By default, any remote IPs are
  # allowed. You can provide CIDRs here with `true` to allow and `false` to
  # deny. The most specific CIDR rule applies to each remote. If all rules are
  # "allow", the default will be "deny", and vice-versa. If both "allow" and
  # "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the
  # default.
  #remote_allow_list:
    # Example to block IPs from this subnet from being used for remote IPs.
    #"172.16.0.0/12": false

    # A more complicated example, allow public IPs but only private IPs from a specific subnet
    #"0.0.0.0/0": true
    #"10.0.0.0/8": false
    #"10.42.42.0/24": true

  # local_allow_list allows you to filter which local IP addresses we advertise
  # to the lighthouses. This uses the same logic as `remote_allow_list`, but
  # additionally, you can specify an `interfaces` map of regular expressions
  # to match against interface names. The regexp must match the entire name.
  # All interface rules must be either true or false (and the default will be
  # the inverse). CIDR rules are matched after interface name rules.
  # Default is all local IP addresses.
  #local_allow_list:
    # Example to block tun0 and all docker interfaces.
    #interfaces:
      #tun0: false
      #'docker.*': false
    # Example to only advertise this subnet to the lighthouse.
    #"10.0.0.0/8": true

# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
  # To listen on both any ipv4 and ipv6 use "[::]"
  host: 0.0.0.0
  port: 4242
  # Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
  # default is 64, does not support reload
  #batch: 64
  # Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
  # Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
  # Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
  # max, net.core.rmem_max and net.core.wmem_max
  read_buffer: 20000000
  write_buffer: 20000000

# EXPERIMENTAL: This option is currently only supported on linux and may
# change in future minor releases.
#
# Routines is the number of thread pairs to run that consume from the tun and UDP queues.
# Currently, this defaults to 1 which means we have 1 tun queue reader and 1
# UDP queue reader. Setting this above one will set IFF_MULTI_QUEUE on the tun
# device and SO_REUSEPORT on the UDP socket to allow multiple queues.
routines: 2

punchy:
  # Continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
  punch: true

  # respond means that a node you are trying to reach will connect back out to you if your hole punching fails
  # this is extremely useful if one node is behind a difficult nat, such as a symmetric NAT
  # Default is false
  respond: true

  # delays a punch response for misbehaving NATs, default is 1 second, respond must be true to take effect
  delay: 1s

# Cipher allows you to choose between the available ciphers for your network. Options are chachapoly or aes
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly

# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
#local_range: "172.16.0.0/24"

# sshd can expose informational and administrative functions via ssh this is a
#sshd:
  # Toggles the feature
  #enabled: true
  # Host and port to listen on, port 22 is not allowed for your safety
  #listen: 127.0.0.1:2222
  # A file containing the ssh host private key to use
  # A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
  #host_key: ./ssh_host_ed25519_key
  # A file containing a list of authorized public keys
  #authorized_users:
    #- user: steeeeve
      # keys can be an array of strings or single string
      #keys:
        #- "ssh public key string"

# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
  # When tun is disabled, a lighthouse can be started without a local tun interface (and therefore without root)
  disabled: false
  # Name of the device
  dev: neb0
  # Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
  drop_local_broadcast: false
  # Toggles forwarding of multicast packets
  drop_multicast: false
  # Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
  tx_queue: 5000
  # Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
  mtu: 8600
  # Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
  routes:
    #- mtu: 8800
    #  route: 10.0.0.0/16
  # Unsafe routes allows you to route traffic over nebula to non-nebula nodes
  # Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
  # NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
  unsafe_routes:
    #- route: 172.16.1.0/24
    #  via: 192.168.100.99
    #  mtu: 1300 #mtu will default to tun mtu if this option is not sepcified


# TODO
# Configure logging level
logging:
  # panic, fatal, error, warning, info, or debug. Default is info
  level: info
  # json or text formats currently available. Default is text
  format: text
  # Disable timestamp logging. useful when output is redirected to logging system that already adds timestamps. Default is false
  #disable_timestamp: true
  # timestamp format is specified in Go time format, see:
  #     https://golang.org/pkg/time/#pkg-constants
  # default when `format: json`: "2006-01-02T15:04:05Z07:00" (RFC3339)
  # default when `format: text`:
  #     when TTY attached: seconds since beginning of execution
  #     otherwise: "2006-01-02T15:04:05Z07:00" (RFC3339)
  # As an example, to log as RFC3339 with millisecond precision, set to:
  #timestamp_format: "2006-01-02T15:04:05.000Z07:00"

#stats:
  #type: graphite
  #prefix: nebula
  #protocol: tcp
  #host: 127.0.0.1:9999
  #interval: 10s

  #type: prometheus
  #listen: 127.0.0.1:8080
  #path: /metrics
  #namespace: prometheusns
  #subsystem: nebula
  #interval: 10s

  # enables counter metrics for meta packets
  #   e.g.: `messages.tx.handshake`
  # NOTE: `message.{tx,rx}.recv_error` is always emitted
  #message_metrics: false

  # enables detailed counter metrics for lighthouse packets
  #   e.g.: `lighthouse.rx.HostQuery`
  #lighthouse_metrics: false

# Handshake Manager Settings
#handshakes:
  # Handshakes are sent to all known addresses at each interval with a linear backoff,
  # Wait try_interval after the 1st attempt, 2 * try_interval after the 2nd, etc, until the handshake is older than timeout
  # A 100ms interval with the default 10 retries will give a handshake 5.5 seconds to resolve before timing out
  #try_interval: 100ms
  #retries: 20
  # trigger_buffer is the size of the buffer channel for quickly sending handshakes
  # after receiving the response for lighthouse queries
  #trigger_buffer: 64


# Nebula security group configuration
firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  # The firewall is default deny. There is no way to write a deny rule.
  # Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
  # Logical evaluation is roughly: port AND proto AND (ca_sha OR ca_name) AND (host OR group OR groups OR cidr)
  # - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
  #   code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
  #   proto: `any`, `tcp`, `udp`, or `icmp`
  #   host: `any` or a literal hostname, ie `test-host`
  #   group: `any` or a literal group name, ie `default-group`
  #   groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
  #   cidr: a CIDR, `0.0.0.0/0` is any.
  #   ca_name: An issuing CA name
  #   ca_sha: An issuing CA shasum

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: any
      host: any

Sep 27 '21 12:09 adi90x

The MTU=8600 looks strange, have you tried using a regular one?

Sep 28 '21 05:09 cg31

Hello,

Yes, I have tried 1300,1500,8000 and 8800 too. 8000+ makes a huge improvement vs 1300/1500 mtu, but still not enough compare to wireguard vpn between those 2 host or direct link.

Regards,

Le mar. 28 sept. 2021 à 07:24, cg31 @.***> a écrit :

The MTU=8600 looks strange, have you tried using a regular one?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/slackhq/nebula/issues/539#issuecomment-928842259, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVGU5AS5HSFL7JHTZXJFQDUEFGPRANCNFSM5EXQ2UUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Sep 28 '21 05:09 adi90x

Maybe try Wireshark or tcpdump get some capture packets?

Sep 28 '21 09:09 tmsi-io

I just did a test, but I don't have a gigabyte link. It seems there is no speed issue with Nebula on a 100M link.

the physical interface

iperf -c 10.169.36.230 -p 7575
------------------------------------------------------------
Client connecting to 10.169.36.230, TCP port 7575
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 10.169.36.239 port 50782 connected with 10.169.36.230 port 7575
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.17 sec   113 MBytes  93.4 Mbits/sec

Tailscale (WireGuard)

[  1] local 10.144.166.98 port 47786 connected with 10.144.148.69 port 7575
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.15 sec  64.5 MBytes  53.3 Mbits/sec

Nebula

[  1] local 10.20.0.7 port 38236 connected with 10.20.0.8 port 7575
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-10.16 sec   107 MBytes  88.3 Mbits/sec

it seems even bettter than TS. My nebula is built from trunk with latest go compiler, and I am using default config.

Sep 29 '21 00:09 cg31

I would be curious as to what the

tc -s qdisc show

output is in your failing scenario (drops/marks/reschedules)

Oct 13 '21 15:10 dtaht

Sorry for the delay. @dtaht

This is on server :

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 591270866 bytes 5360006 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 210 drop_overlimit 0 new_flow_count 1657 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev neb0 root 
 Sent 2470840 bytes 47516 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev neb0 parent :5 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :4 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :3 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 2470780 bytes 47515 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :2 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :1 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

This is on client :

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 260239109396 bytes 178370125 pkt (dropped 0, overlimits 0 requeues 3) 
 backlog 0b 0p requeues 3
  maxpacket 68130 drop_overlimit 0 new_flow_count 2146 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc mq 0: dev neb0 root 
 Sent 840765508 bytes 94603 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev neb0 parent :5 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :4 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :3 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 840765396 bytes 94601 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :2 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :1 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 112 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

This is with routines: 5 , tx_queue: 5000 , mtu: 8900, read_buffer: 20000000 and write_buffer: 20000000 I have also upgrade to nebula 1.5.0

Does it gives any clue ?

Nov 23 '21 20:11 adi90x

Hello,

Bumping this thread , as I have now upgrade to 1.5.2 but still got the same issue , on LAN using local ip I acheive 900Mb/s ~ 1Gb/s ; While with Nebula ip , I max out at 200Mb/s~210Mb/s . Is there anything that could be log/check/tested to try to improve that ?

Regards

Apr 28 '22 16:04 adi90x

Sorry for forgetting about this. The fq_codel stuff reveals no drops, so that's not the source of your issue, and that's my speciality. You are bottlenecking elsewhere.

Apr 30 '22 03:04 dtaht

routines pins i/o goroutines to dedicated o/s threads, and I don't think they'll provide any performance value set greater than the number of CPU's on the host. I didn't test the different scenarios, though, so I could be wrong!

It might be helpful to see how much traffic can flow between the hosts without the TCP stack. If you run iperf3 <snip> -u -b 1g, to send UDP traffic at a rate of 1gps, and compare the nebula vs local routes, what do you get?

May 05 '22 20:05 brad-defined

@adi90x Were you able to run the test mentioned above?

Jul 25 '22 20:07 johnmaguire

I'm closing this issue out as stale. Please feel free to ping me if you'd like it reopened.

Dec 07 '22 18:12 johnmaguire

nebula nebula copied to clipboard

Performance issue on LAN

nebula
nebula copied to clipboard