longhorn-engine icon indicating copy to clipboard operation
longhorn-engine copied to clipboard

Ublk

Open Kampadais opened this issue 1 year ago • 3 comments

Which issue(s) this PR fixes:

Issues https://github.com/longhorn/longhorn/issues/5159 , https://github.com/longhorn/longhorn/issues/6590

What this PR does / why we need it:

This PR uses @derekbit 's POC ubdsrv to integrate ublk with longhorn-engine. It also implements multiple TCP connections from the controller to each replica.

Using ublk as a frontend option significantly boosts the IOPS in the frontend part of the engine. In our tests, we measure up to ~500k IOPS at the frontend (before the controller communicates with the replicas).

Using multiple streams between the controller and the replicas also removes part of the communication bottleneck and boosts IOPS from ~50k to ~170k (measured before each replica's R/W operations).

With both features enabled, we measure 110k IOPS on reads (instead of 50k with the default version). Write operations yield approximately the same IOPS, which probably means that the backend upgrade is necessary anyway (as mentioned in https://github.com/longhorn/longhorn/issues/6600) to utilize the full potential of ublk and multistreams.

A full table of measurements is shown below. I used fio over one 1GB replica in a localhost enviroment. CPU : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz RAM: 16GB Disk: Western Digital WD10EZEX 1TB

image

Special notes for your reviewer:

In order to replicate the results you will need my version of ubdsrv installed: https://github.com/Kampadais/ubdsrv

To run the controller with ublk frontend, 6 frontend queues and 6 replica-streams: ./longhorn controller test --frontend ublk --size 1g --current-size 1g --frontend-queues 6 --replica-streams 6 --replica tcp://localhost:9502

The fio command used for testing: sudo fio --name=read_iops --filename=/dev/ublkb0 --size=5G --numjobs=12 --time_based --runtime=30s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=64 --rw=randwrite --group_reporting=1

We are still the need of a go-ublk-helper because the whole startup procedure is within the frontend.go file and monitoring status is limited. There is also a limitation regarding the name of the block device provided by ublk.

Additional documentation or context

Kampadais avatar Mar 27 '24 16:03 Kampadais

This pull request is now in conflict. Could you fix it @Kampadais? 🙏

mergify[bot] avatar Mar 27 '24 16:03 mergify[bot]

This pull request is now in conflict. Could you fix it @Kampadais? 🙏

mergify[bot] avatar Apr 24 '24 16:04 mergify[bot]

cc @PhanLe1010

innobead avatar May 31 '24 04:05 innobead

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jul 16 '24 01:07 github-actions[bot]

Removing stale label. I am going to work on this ticket next

PhanLe1010 avatar Jul 16 '24 01:07 PhanLe1010

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 30 '24 01:08 github-actions[bot]

This PR was closed because it has been stalled for 10 days with no activity.

github-actions[bot] avatar Sep 09 '24 01:09 github-actions[bot]