I have 3 servers networ kwise configured as follows
- A is a DELL R710 is running Linux 5.13.19-1-pve Proxmox VE 7.1 and has 4 NICs teamed in a
balance-rr
mode bond.
- B is a DELL R610 is running Linux 5.13.19-1-pve Proxmox VE 7.1 and has 4 NICs teamed in a
balance-rr
mode bond.
- C is a DELL R710 running FreeBSD 12.2-RELEASE-p1 with a lagg over 8 NICs in
roundrobin
(this is a TrueNAS distro).
All NICs are 1 GBps.
When I run iperf3
between the Linux blades, I max at about 3 GBps, and the window goes up to an average of ~300 KiB. However, between the TrueNAS (FreeBSD) blade and the Linux blades, the TCP stream maxes at 1.20 Gbps and caps the window at ~60 KiB average. If I run parallel streams (i.e., iperf3 ... -P 8
) I can saturate the bond. On the other hand, as expected, the retransmit count is pretty high in both cases. So, my questions are,
- Why is FreeBSD not reaching the same throughput if supposedly both are approaching the problem in the same way? (maybe that's where I am wrong).
- Is there a tuning option or combination of options to make the TCP stack more tolerant to out-of-order without triggering immediate retransmits? (I am vaguely familiar with the 3-ACK reTX, basics of TCP congestion control, and so on).
I will include here some tunables and options I have used during my testing.
- All ifaces are set to use jumbo frames (MTU 9000).
- The Linux boxes are tuned as follows
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_mem = 1638400 1638400 1638400
net.ipv4.tcp_rmem = 10240 87380 16777216
net.ipv4.tcp_rmem = 10240 87380 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_reordering = 127
net.ipv4.tcp_max_reordering = 1000
net.core.netdev_max_backlog = 10000
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_congestion_control = reno
- The FreeBSD (TrueNAS Core ~= FreeNAS) box is tuned as follows
kern.ipc.maxsockbuf=614400000
kern.ipc.somaxconn=1024
net.route.netisr_maxqlen=8192
net.inet.ip.intr_queue_maxlen=8192
net.inet.tcp.mssdflt=8948
net.inet.tcp.reass.maxqueuelen=1000
net.inet.tcp.recvbuf_inc=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.sendbuf_max=307200000
net.inet.tcp.recvbuf_max=307200000
net.inet.tcp.recvspace=65228
net.inet.tcp.sendspace=65228
net.inet.tcp.minmss=536
net.inet.tcp.abc_l_var=52
net.inet.tcp.initcwnd_segments=52 # start fast
net.inet.udp.recvspace=1048576
net.inet.udp.sendspace=1048576