longhorn-engine
longhorn-engine copied to clipboard
Ublk
Which issue(s) this PR fixes:
Issues https://github.com/longhorn/longhorn/issues/5159 , https://github.com/longhorn/longhorn/issues/6590
What this PR does / why we need it:
This PR uses @derekbit 's POC ubdsrv to integrate ublk with longhorn-engine. It also implements multiple TCP connections from the controller to each replica.
Using ublk as a frontend option significantly boosts the IOPS in the frontend part of the engine. In our tests, we measure up to ~500k IOPS at the frontend (before the controller communicates with the replicas).
Using multiple streams between the controller and the replicas also removes part of the communication bottleneck and boosts IOPS from ~50k to ~170k (measured before each replica's R/W operations).
With both features enabled, we measure 110k IOPS on reads (instead of 50k with the default version). Write operations yield approximately the same IOPS, which probably means that the backend upgrade is necessary anyway (as mentioned in https://github.com/longhorn/longhorn/issues/6600) to utilize the full potential of ublk and multistreams.
A full table of measurements is shown below. I used fio over one 1GB replica in a localhost enviroment. CPU : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz RAM: 16GB Disk: Western Digital WD10EZEX 1TB
Special notes for your reviewer:
In order to replicate the results you will need my version of ubdsrv installed: https://github.com/Kampadais/ubdsrv
To run the controller with ublk frontend, 6 frontend queues and 6 replica-streams:
./longhorn controller test --frontend ublk --size 1g --current-size 1g --frontend-queues 6 --replica-streams 6 --replica tcp://localhost:9502
The fio command used for testing:
sudo fio --name=read_iops --filename=/dev/ublkb0 --size=5G --numjobs=12 --time_based --runtime=30s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1
We are still the need of a go-ublk-helper because the whole startup procedure is within the frontend.go file and monitoring status is limited. There is also a limitation regarding the name of the block device provided by ublk.
Additional documentation or context
This pull request is now in conflict. Could you fix it @Kampadais? 🙏
This pull request is now in conflict. Could you fix it @Kampadais? 🙏
cc @PhanLe1010
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.
Removing stale label. I am going to work on this ticket next
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This PR was closed because it has been stalled for 10 days with no activity.