Summary
I'm seeing dramatically fluctuating I/O performance on a ZFS SSD mirror in Proxmox VE 7 (Bullseye). I'm simply too much of a novice to be able to track it down on my own.
Details
This is VERY noticeably poor performance in real-world tasks, so it's not just artificial benchmarks. But to help diagnose it I'm running:
sysbench fileio --file-test-mode=rndrw run
It's running "bare-metal" from the Proxmox terminal without any VM's active. The results vary wildly. Here are two examples:
File operations:
reads/s: 2316.07
writes/s: 1544.08
fsyncs/s: 4949.70
Throughput:
read, MiB/s: 36.19
written, MiB/s: 24.13
General statistics:
total time: 10.0062s
total number of events: 88040
Latency (ms):
min: 0.00
avg: 0.11
max: 35.66
95th percentile: 0.65
sum: 9947.54
Threads fairness:
events (avg/stddev): 88040.0000/0.00
execution time (avg/stddev): 9.9475/0.00
and
File operations:
reads/s: 22.60
writes/s: 15.07
fsyncs/s: 56.98
Throughput:
read, MiB/s: 0.35
written, MiB/s: 0.24
General statistics:
total time: 10.6162s
total number of events: 877
Latency (ms):
min: 0.00
avg: 11.43
max: 340.62
95th percentile: 77.19
sum: 10020.19
Threads fairness:
events (avg/stddev): 877.0000/0.00
execution time (avg/stddev): 10.0202/0.00
As you see, there's a 10,000-fold swing in the total number of events and a massive increase in latency. These swings are not "one-off." It's constantly fluctuating between these kinds of extremes.
I've done my best to try to narrow down simple hardware issues. Both SSD's are brand new with all 100's in smartctl. I've swapped out SATA cables. I've run it with the mirror degraded to try to isolate a single drive problem. I've moved the drives to a separate SATA controller. Nothing gives me a different result.
I've got a second server configured in a similar fashion, though with older (and unmatched) SSD's in the mirror. Not seeing this issue. The server hardware differs, though. The poor results are from the system described below. The "normal" seeming results are from an old converted PC with an E3-1275v2.
What I'm hoping for are tips to help diagnose this issue. It seems that the problem is with latency. What can cause this? What next steps should I take?
Thanks in advance!
System (if it helps)
- MB: Supermicro X9DRi-F
- CPU: Dual Xeon E5-2650 v2
- RAM: 128 GB (8 x 16GB)
- SATA Controllers: Onboard SATA 3 (separate SATA 2 also tested)
- SSD: 2x 1GB TeamGroup SATA (yeah, cheap, but should be fine)
- PCIe Cards:
- Mellanox MCX312B
- LSI SAS9207-8i (HBA connected to 8 unmounted disks...passed through to VM)
- Nvidia GTX 750 (passed through to VM)