nebula
nebula copied to clipboard
Performance issue on LAN
Hello,
I am building an hybrid network, some host are on the same local network ( 192.168.0.0/24 )and some other host are on the internet.
I have tried tweaking tx_queue
to 5000 and read_buffer
& write_buffer
to 20000000 , which greatly improve performance.
However, between 2 hosts on the same lan iperf
test on nebula ip reach +-200 Mbits/sec while same test using local ip max at 938 Mbits/sec ( which makes sens for a gigabyte link ).
Any idea of what could be done to improve performance between same network hosts ?
Regards,
Can you also share the hardware on the nodes and the iperf commands you used?
Also ip addr
output and nebula config might help as well.
Hello,
Host1 is small server and Host2 is a virtual machine on a proxmox host.
Issue is even worst if trying from 2 VM ( i.e : 15Gb using local ip vs 600Mb using Nebula IP )
As discussed on Slack, I have try setting routines
at 2,5,10,100 with no luck, I have also try preferred_ranges
to make sure Nebula was using the local ip, with no succes for the moment.
Any other idea ?
Host 1 :
Reporting only the part that make sense
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 84:39:be:9c:03:c0 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.25/24 brd 192.168.0.255 scope global dynamic enp1s0
valid_lft 75593sec preferred_lft 75593sec
inet6 fe80::8639:beff:fe9c:3c0/64 scope link
valid_lft forever preferred_lft forever
(.....)
158: neb0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8000 qdisc fq_codel state UNKNOWN group default qlen 50000
link/none
inet 10.6.0.3/16 scope global neb0
valid_lft forever preferred_lft forever
Nebula config of host 1 :
# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)
# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
# The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
ca: /opt/nebula/ca.crt
cert: /opt/nebula/host1.fr.crt
key: /opt/nebula/host1.fr.key
#blocklist is a list of certificate fingerprints that we will refuse to talk to
#blocklist:
# - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72
# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
# "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
"10.6.0.1": ["xxxxxxx:4242"]
"10.6.0.2": ["xxxxxx:4242"]
"10.6.0.3": ["host1:4242"]
lighthouse:
# am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
# you have configured to be lighthouses in your network
am_lighthouse: true
# serve_dns optionally starts a dns listener that responds to various queries and can even be
# delegated to for resolution
#serve_dns: false
#dns:
# The DNS host defines the IP to bind the dns listener to. This also allows binding to the nebula node IP.
#host: 0.0.0.0
#port: 53
# interval is the number of seconds between updates from this node to a lighthouse.
# during updates, a node sends information about its current IP addresses to each node.
interval: 60
# hosts is a list of lighthouse hosts this node should report to and query from
# IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
# IMPORTANT2: THIS SHOULD BE LIGHTHOUSES' NEBULA IPs, NOT LIGHTHOUSES' REAL ROUTABLE IPs
#hosts:
# - "192.168.100.1"
# remote_allow_list allows you to control ip ranges that this node will
# consider when handshaking to another node. By default, any remote IPs are
# allowed. You can provide CIDRs here with `true` to allow and `false` to
# deny. The most specific CIDR rule applies to each remote. If all rules are
# "allow", the default will be "deny", and vice-versa. If both "allow" and
# "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the
# default.
#remote_allow_list:
# Example to block IPs from this subnet from being used for remote IPs.
#"172.16.0.0/12": false
# A more complicated example, allow public IPs but only private IPs from a specific subnet
#"0.0.0.0/0": true
#"10.0.0.0/8": false
#"10.42.42.0/24": true
# local_allow_list allows you to filter which local IP addresses we advertise
# to the lighthouses. This uses the same logic as `remote_allow_list`, but
# additionally, you can specify an `interfaces` map of regular expressions
# to match against interface names. The regexp must match the entire name.
# All interface rules must be either true or false (and the default will be
# the inverse). CIDR rules are matched after interface name rules.
# Default is all local IP addresses.
#local_allow_list:
# Example to block tun0 and all docker interfaces.
#interfaces:
#tun0: false
#'docker.*': false
# Example to only advertise this subnet to the lighthouse.
#"10.0.0.0/8": true
# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
# To listen on both any ipv4 and ipv6 use "[::]"
host: 0.0.0.0
port: 4242
# Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
# default is 64, does not support reload
#batch: 64
# Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
# Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
# Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
# max, net.core.rmem_max and net.core.wmem_max
read_buffer: 200000000
write_buffer: 200000000
preferred_ranges:
192.168.0.0/16
# EXPERIMENTAL: This option is currently only supported on linux and may
# change in future minor releases.
#
# Routines is the number of thread pairs to run that consume from the tun and UDP queues.
# Currently, this defaults to 1 which means we have 1 tun queue reader and 1
# UDP queue reader. Setting this above one will set IFF_MULTI_QUEUE on the tun
# device and SO_REUSEPORT on the UDP socket to allow multiple queues.
routines: 2
punchy:
# Continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
punch: true
# respond means that a node you are trying to reach will connect back out to you if your hole punching fails
# this is extremely useful if one node is behind a difficult nat, such as a symmetric NAT
# Default is false
respond: true
# delays a punch response for misbehaving NATs, default is 1 second, respond must be true to take effect
delay: 1s
# Cipher allows you to choose between the available ciphers for your network. Options are chachapoly or aes
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly
# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
local_range: "192.168.0.0/24"
# sshd can expose informational and administrative functions via ssh this is a
sshd:
# Toggles the feature
enabled: true
# Host and port to listen on, port 22 is not allowed for your safety
listen: 0.0.0.0:2222
# A file containing the ssh host private key to use
# A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
host_key: /opt/nebula/ssh_host_ed25519_key
# A file containing a list of authorized public keys
authorized_users:
- user: adrienm
# keys can be an array of strings or single string
keys:
- "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDVA9xmt3mSVv1r/bNm/W4L1ql2VvL+eRFianyPPFEFUO98aOaH/tCr8ScAfKm9eDJ2lDfsJFsyfV+4e4DnsyoG3VOERgwv4QpCQRkMy0irjIO/XWG3s+aSPDM5y1iRAv0E9w6dHE8UXZLRdlF4W75MifctFtke2+uJeBlDHv7Ur+DezHpZPw2oJBbAj9ScYlsx93Q4C0PljiPgbx02fD3QwuqbILBZMsCUL4VUSZne9q0nKTSuJAPs7lDDvXJhXB5EMLSxHoO7L5rQ1tO2QuCXXwrBZzv507Fgn5jp9M62cXJDgmNuBJ+sXNIUbABRd1OAnaHTWXW0tFvYqRmDzp4FWVFjDnr1mJ53OxcMySTfPzxsr0OgiwJH6Bqx3jRtR2ENWfgYmfu8WSIN5pQgvcdU5Pcoi7M0Cf+7kMJjUTi2ag9yKahU6H1lJ72QZyylaSTd/RNK/CKFy/qsfJ/VAWlbaQZJuhTsp6FWXMq6YkgAvhP23mi682jJXMAUIfEzW6k= adrienm@swift"
# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
# When tun is disabled, a lighthouse can be started without a local tun interface (and therefore without root)
disabled: false
# Name of the device
dev: neb0
# Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
drop_local_broadcast: false
# Toggles forwarding of multicast packets
drop_multicast: false
# Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
tx_queue: 5000
# Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
mtu: 8600
# Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
routes:
#- mtu: 8800
# route: 10.0.0.0/16
# Unsafe routes allows you to route traffic over nebula to non-nebula nodes
# Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
# NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
unsafe_routes:
#- route: 172.16.1.0/24
# via: 192.168.100.99
# mtu: 1300 #mtu will default to tun mtu if this option is not sepcified
# TODO
# Configure logging level
logging:
# panic, fatal, error, warning, info, or debug. Default is info
level: info
# json or text formats currently available. Default is text
format: text
# Disable timestamp logging. useful when output is redirected to logging system that already adds timestamps. Default is false
#disable_timestamp: true
# timestamp format is specified in Go time format, see:
# https://golang.org/pkg/time/#pkg-constants
# default when `format: json`: "2006-01-02T15:04:05Z07:00" (RFC3339)
# default when `format: text`:
# when TTY attached: seconds since beginning of execution
# otherwise: "2006-01-02T15:04:05Z07:00" (RFC3339)
# As an example, to log as RFC3339 with millisecond precision, set to:
#timestamp_format: "2006-01-02T15:04:05.000Z07:00"
#stats:
#type: graphite
#prefix: nebula
#protocol: tcp
#host: 127.0.0.1:9999
#interval: 10s
#type: prometheus
#listen: 127.0.0.1:8080
#path: /metrics
#namespace: prometheusns
#subsystem: nebula
#interval: 10s
# enables counter metrics for meta packets
# e.g.: `messages.tx.handshake`
# NOTE: `message.{tx,rx}.recv_error` is always emitted
#message_metrics: false
# enables detailed counter metrics for lighthouse packets
# e.g.: `lighthouse.rx.HostQuery`
#lighthouse_metrics: false
# Handshake Manager Settings
#handshakes:
# Handshakes are sent to all known addresses at each interval with a linear backoff,
# Wait try_interval after the 1st attempt, 2 * try_interval after the 2nd, etc, until the handshake is older than timeout
# A 100ms interval with the default 10 retries will give a handshake 5.5 seconds to resolve before timing out
#try_interval: 100ms
#retries: 20
# trigger_buffer is the size of the buffer channel for quickly sending handshakes
# after receiving the response for lighthouse queries
#trigger_buffer: 64
# Nebula security group configuration
firewall:
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
# The firewall is default deny. There is no way to write a deny rule.
# Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
# Logical evaluation is roughly: port AND proto AND (ca_sha OR ca_name) AND (host OR group OR groups OR cidr)
# - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
# code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
# proto: `any`, `tcp`, `udp`, or `icmp`
# host: `any` or a literal hostname, ie `test-host`
# group: `any` or a literal group name, ie `default-group`
# groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
# cidr: a CIDR, `0.0.0.0/0` is any.
# ca_name: An issuing CA name
# ca_sha: An issuing CA shasum
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: any
host: any
For host2 :
ip addr of host2 :
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether a2:cd:f6:57:be:86 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.26/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a0cd:f6ff:fe57:be86/64 scope link
valid_lft forever preferred_lft forever
(.....)
216: neb0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 8600 qdisc fq_codel state UNKNOWN group default qlen 5000
link/none
inet 10.6.0.4/16 scope global neb0
valid_lft forever preferred_lft forever
and nebula config :
# This is the nebula example configuration file. You must edit, at a minimum, the static_host_map, lighthouse, and firewall sections
# Some options in this file are HUPable, including the pki section. (A HUP will reload credentials from disk without affecting existing tunnels)
# PKI defines the location of credentials for this node. Each of these can also be inlined by using the yaml ": |" syntax.
pki:
# The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
ca: /opt/nebula/ca.crt
cert: /opt/nebula/host2.crt
key: /opt/nebula/host2.key
#blocklist is a list of certificate fingerprints that we will refuse to talk to
#blocklist:
# - c99d4e650533b92061b09918e838a5a0a6aaee21eed1d12fd937682865936c72
# The static host map defines a set of hosts with fixed IP addresses on the internet (or any network).
# A host can have multiple fixed IP addresses defined here, and nebula will try each when establishing a tunnel.
# The syntax is:
# "{nebula ip}": ["{routable ip/dns name}:{routable port}"]
# Example, if your lighthouse has the nebula IP of 192.168.100.1 and has the real ip address of 100.64.22.11 and runs on port 4242:
static_host_map:
"10.6.0.1": ["xxxxxx:4242"]
"10.6.0.2": ["xxxxxx:4242"]
"10.6.0.3": ["host1:4242"]
preferred_ranges:
192.168.0.0/16
lighthouse:
# am_lighthouse is used to enable lighthouse functionality for a node. This should ONLY be true on nodes
# you have configured to be lighthouses in your network
am_lighthouse: false
# serve_dns optionally starts a dns listener that responds to various queries and can even be
# delegated to for resolution
#serve_dns: false
#dns:
# The DNS host defines the IP to bind the dns listener to. This also allows binding to the nebula node IP.
#host: 0.0.0.0
#port: 53
# interval is the number of seconds between updates from this node to a lighthouse.
# during updates, a node sends information about its current IP addresses to each node.
interval: 60
# hosts is a list of lighthouse hosts this node should report to and query from
# IMPORTANT: THIS SHOULD BE EMPTY ON LIGHTHOUSE NODES
# IMPORTANT2: THIS SHOULD BE LIGHTHOUSES' NEBULA IPs, NOT LIGHTHOUSES' REAL ROUTABLE IPs
hosts:
- "10.6.0.1"
- "10.6.0.2"
- "10.6.0.3"
# remote_allow_list allows you to control ip ranges that this node will
# consider when handshaking to another node. By default, any remote IPs are
# allowed. You can provide CIDRs here with `true` to allow and `false` to
# deny. The most specific CIDR rule applies to each remote. If all rules are
# "allow", the default will be "deny", and vice-versa. If both "allow" and
# "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the
# default.
#remote_allow_list:
# Example to block IPs from this subnet from being used for remote IPs.
#"172.16.0.0/12": false
# A more complicated example, allow public IPs but only private IPs from a specific subnet
#"0.0.0.0/0": true
#"10.0.0.0/8": false
#"10.42.42.0/24": true
# local_allow_list allows you to filter which local IP addresses we advertise
# to the lighthouses. This uses the same logic as `remote_allow_list`, but
# additionally, you can specify an `interfaces` map of regular expressions
# to match against interface names. The regexp must match the entire name.
# All interface rules must be either true or false (and the default will be
# the inverse). CIDR rules are matched after interface name rules.
# Default is all local IP addresses.
#local_allow_list:
# Example to block tun0 and all docker interfaces.
#interfaces:
#tun0: false
#'docker.*': false
# Example to only advertise this subnet to the lighthouse.
#"10.0.0.0/8": true
# Port Nebula will be listening on. The default here is 4242. For a lighthouse node, the port should be defined,
# however using port 0 will dynamically assign a port and is recommended for roaming nodes.
listen:
# To listen on both any ipv4 and ipv6 use "[::]"
host: 0.0.0.0
port: 4242
# Sets the max number of packets to pull from the kernel for each syscall (under systems that support recvmmsg)
# default is 64, does not support reload
#batch: 64
# Configure socket buffers for the udp side (outside), leave unset to use the system defaults. Values will be doubled by the kernel
# Default is net.core.rmem_default and net.core.wmem_default (/proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_default)
# Maximum is limited by memory in the system, SO_RCVBUFFORCE and SO_SNDBUFFORCE is used to avoid having to raise the system wide
# max, net.core.rmem_max and net.core.wmem_max
read_buffer: 20000000
write_buffer: 20000000
# EXPERIMENTAL: This option is currently only supported on linux and may
# change in future minor releases.
#
# Routines is the number of thread pairs to run that consume from the tun and UDP queues.
# Currently, this defaults to 1 which means we have 1 tun queue reader and 1
# UDP queue reader. Setting this above one will set IFF_MULTI_QUEUE on the tun
# device and SO_REUSEPORT on the UDP socket to allow multiple queues.
routines: 2
punchy:
# Continues to punch inbound/outbound at a regular interval to avoid expiration of firewall nat mappings
punch: true
# respond means that a node you are trying to reach will connect back out to you if your hole punching fails
# this is extremely useful if one node is behind a difficult nat, such as a symmetric NAT
# Default is false
respond: true
# delays a punch response for misbehaving NATs, default is 1 second, respond must be true to take effect
delay: 1s
# Cipher allows you to choose between the available ciphers for your network. Options are chachapoly or aes
# IMPORTANT: this value must be identical on ALL NODES/LIGHTHOUSES. We do not/will not support use of different ciphers simultaneously!
#cipher: chachapoly
# Local range is used to define a hint about the local network range, which speeds up discovering the fastest
# path to a network adjacent nebula node.
#local_range: "172.16.0.0/24"
# sshd can expose informational and administrative functions via ssh this is a
#sshd:
# Toggles the feature
#enabled: true
# Host and port to listen on, port 22 is not allowed for your safety
#listen: 127.0.0.1:2222
# A file containing the ssh host private key to use
# A decent way to generate one: ssh-keygen -t ed25519 -f ssh_host_ed25519_key -N "" < /dev/null
#host_key: ./ssh_host_ed25519_key
# A file containing a list of authorized public keys
#authorized_users:
#- user: steeeeve
# keys can be an array of strings or single string
#keys:
#- "ssh public key string"
# Configure the private interface. Note: addr is baked into the nebula certificate
tun:
# When tun is disabled, a lighthouse can be started without a local tun interface (and therefore without root)
disabled: false
# Name of the device
dev: neb0
# Toggles forwarding of local broadcast packets, the address of which depends on the ip/mask encoded in pki.cert
drop_local_broadcast: false
# Toggles forwarding of multicast packets
drop_multicast: false
# Sets the transmit queue length, if you notice lots of transmit drops on the tun it may help to raise this number. Default is 500
tx_queue: 5000
# Default MTU for every packet, safe setting is (and the default) 1300 for internet based traffic
mtu: 8600
# Route based MTU overrides, you have known vpn ip paths that can support larger MTUs you can increase/decrease them here
routes:
#- mtu: 8800
# route: 10.0.0.0/16
# Unsafe routes allows you to route traffic over nebula to non-nebula nodes
# Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
# NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
unsafe_routes:
#- route: 172.16.1.0/24
# via: 192.168.100.99
# mtu: 1300 #mtu will default to tun mtu if this option is not sepcified
# TODO
# Configure logging level
logging:
# panic, fatal, error, warning, info, or debug. Default is info
level: info
# json or text formats currently available. Default is text
format: text
# Disable timestamp logging. useful when output is redirected to logging system that already adds timestamps. Default is false
#disable_timestamp: true
# timestamp format is specified in Go time format, see:
# https://golang.org/pkg/time/#pkg-constants
# default when `format: json`: "2006-01-02T15:04:05Z07:00" (RFC3339)
# default when `format: text`:
# when TTY attached: seconds since beginning of execution
# otherwise: "2006-01-02T15:04:05Z07:00" (RFC3339)
# As an example, to log as RFC3339 with millisecond precision, set to:
#timestamp_format: "2006-01-02T15:04:05.000Z07:00"
#stats:
#type: graphite
#prefix: nebula
#protocol: tcp
#host: 127.0.0.1:9999
#interval: 10s
#type: prometheus
#listen: 127.0.0.1:8080
#path: /metrics
#namespace: prometheusns
#subsystem: nebula
#interval: 10s
# enables counter metrics for meta packets
# e.g.: `messages.tx.handshake`
# NOTE: `message.{tx,rx}.recv_error` is always emitted
#message_metrics: false
# enables detailed counter metrics for lighthouse packets
# e.g.: `lighthouse.rx.HostQuery`
#lighthouse_metrics: false
# Handshake Manager Settings
#handshakes:
# Handshakes are sent to all known addresses at each interval with a linear backoff,
# Wait try_interval after the 1st attempt, 2 * try_interval after the 2nd, etc, until the handshake is older than timeout
# A 100ms interval with the default 10 retries will give a handshake 5.5 seconds to resolve before timing out
#try_interval: 100ms
#retries: 20
# trigger_buffer is the size of the buffer channel for quickly sending handshakes
# after receiving the response for lighthouse queries
#trigger_buffer: 64
# Nebula security group configuration
firewall:
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
max_connections: 100000
# The firewall is default deny. There is no way to write a deny rule.
# Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
# Logical evaluation is roughly: port AND proto AND (ca_sha OR ca_name) AND (host OR group OR groups OR cidr)
# - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
# code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
# proto: `any`, `tcp`, `udp`, or `icmp`
# host: `any` or a literal hostname, ie `test-host`
# group: `any` or a literal group name, ie `default-group`
# groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
# cidr: a CIDR, `0.0.0.0/0` is any.
# ca_name: An issuing CA name
# ca_sha: An issuing CA shasum
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: any
host: any
The MTU=8600 looks strange, have you tried using a regular one?
Hello,
Yes, I have tried 1300,1500,8000 and 8800 too. 8000+ makes a huge improvement vs 1300/1500 mtu, but still not enough compare to wireguard vpn between those 2 host or direct link.
Regards,
Le mar. 28 sept. 2021 à 07:24, cg31 @.***> a écrit :
The MTU=8600 looks strange, have you tried using a regular one?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/slackhq/nebula/issues/539#issuecomment-928842259, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVGU5AS5HSFL7JHTZXJFQDUEFGPRANCNFSM5EXQ2UUQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Maybe try Wireshark or tcpdump get some capture packets?
I just did a test, but I don't have a gigabyte link. It seems there is no speed issue with Nebula on a 100M link.
the physical interface
iperf -c 10.169.36.230 -p 7575
------------------------------------------------------------
Client connecting to 10.169.36.230, TCP port 7575
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 1] local 10.169.36.239 port 50782 connected with 10.169.36.230 port 7575
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.17 sec 113 MBytes 93.4 Mbits/sec
Tailscale (WireGuard)
[ 1] local 10.144.166.98 port 47786 connected with 10.144.148.69 port 7575
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.15 sec 64.5 MBytes 53.3 Mbits/sec
Nebula
[ 1] local 10.20.0.7 port 38236 connected with 10.20.0.8 port 7575
[ ID] Interval Transfer Bandwidth
[ 1] 0.00-10.16 sec 107 MBytes 88.3 Mbits/sec
it seems even bettter than TS. My nebula is built from trunk with latest go compiler, and I am using default config.
I would be curious as to what the
tc -s qdisc show
output is in your failing scenario (drops/marks/reschedules)
Sorry for the delay. @dtaht
This is on server :
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 591270866 bytes 5360006 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 210 drop_overlimit 0 new_flow_count 1657 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc mq 0: dev neb0 root
Sent 2470840 bytes 47516 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev neb0 parent :5 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :4 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :3 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 2470780 bytes 47515 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :2 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 60 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :1 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
This is on client :
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 260239109396 bytes 178370125 pkt (dropped 0, overlimits 0 requeues 3)
backlog 0b 0p requeues 3
maxpacket 68130 drop_overlimit 0 new_flow_count 2146 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc mq 0: dev neb0 root
Sent 840765508 bytes 94603 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev neb0 parent :5 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :4 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :3 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 840765396 bytes 94601 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :2 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev neb0 parent :1 limit 10240p flows 1024 quantum 8900 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 112 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
This is with routines: 5 , tx_queue: 5000 , mtu: 8900, read_buffer: 20000000 and write_buffer: 20000000 I have also upgrade to nebula 1.5.0
Does it gives any clue ?
Hello,
Bumping this thread , as I have now upgrade to 1.5.2 but still got the same issue , on LAN using local ip I acheive 900Mb/s ~ 1Gb/s ; While with Nebula ip , I max out at 200Mb/s~210Mb/s . Is there anything that could be log/check/tested to try to improve that ?
Regards
Sorry for forgetting about this. The fq_codel stuff reveals no drops, so that's not the source of your issue, and that's my speciality. You are bottlenecking elsewhere.
routines
pins i/o goroutines to dedicated o/s threads, and I don't think they'll provide any performance value set greater than the number of CPU's on the host. I didn't test the different scenarios, though, so I could be wrong!
It might be helpful to see how much traffic can flow between the hosts without the TCP stack. If you run iperf3 <snip> -u -b 1g
, to send UDP traffic at a rate of 1gps, and compare the nebula vs local routes, what do you get?
@adi90x Were you able to run the test mentioned above?
I'm closing this issue out as stale. Please feel free to ping me if you'd like it reopened.