zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Allow users to force striped cache (L2ARC) devices, instead of only allowing JBOD

Open recklessnl opened this issue 2 years ago • 8 comments

Describe the feature would like to see added to OpenZFS

Currently when you add a cache vdev (L2ARC) to a pool, it will not stripe the devices, it just works in a JBOD way. For performance reasons it would make it so much better if we could stripe the drives in the L2ARC together (RAID 0) for more read performance. Since it's a cache device only, the risk of one of the drives failing is completely insignificant, since it only caches data. Therefore I feel it should at the very least be an option to stripe the cache vdev for users looking for maximum read performance from their filesystem.

How will this feature improve OpenZFS?

Pools with a cache L2ARC vdev, consisting of striped SSDs, will provide much better performance than pools without, and considerably more performance compared to JBOD SSDs. It's an amazing way to provide incredibly fast read response times, while allowing slower media (hard drives) to do the actual storage and redundancy.

recklessnl avatar Jan 24 '22 20:01 recklessnl

Let me give you an insight to OpenZFS internals in terms of how L2ARC caching works:

Writing: We cycle through a list of L2ARC devices and once we find one that is eligible for writing, we lookup which pool it belongs to and cache eligible data from that pool. It's not really JBOD (ie first fill-up device 1, then move on device 2 etc), we balance the caching across all devices, however this is single threaded if I recall correctly.

Reading: We read from whichever L2ARC device has valid data.

gamanakis avatar Jan 25 '22 09:01 gamanakis

So does this mean the L2ARC pool is striped for maximum performance? I'm not sure I fully understand.

recklessnl avatar Jan 25 '22 20:01 recklessnl

Yes, the L2ARC devices for the same pool are striped, but we write to them one by one since the operation is single threaded.

Also the locking mechanism is pretty complex and I am not sure it would be a trivial task to revise it to allow multithreaded lock-safe simultaneous caching to multiple L2ARC devices.

However, we can read from multiple L2ARC devices on the same pool simultaneously.

gamanakis avatar Jan 25 '22 21:01 gamanakis

That's surprising, since I recently removed one cache device from a 4-drive cache vdev and it kept operating normally. If it were truly striped, I would expect the cache vdev to fail just like a normal pool would, right?

Wouldn't multithreaded L2ARC operations speed up the L2ARC? With SSDs rapidly becoming faster I feel this shouldn't be limited to a single thread, ideally.

recklessnl avatar Jan 25 '22 21:01 recklessnl

That's surprising, since I recently removed one cache device from a 4-drive cache vdev and it kept operating normally. If it were truly striped, I would expect the cache vdev to fail just like a normal pool would, right?

As I understand it, they're "striped" on a per-object level so all you did was remove one-fourth of the contents of the cache. Pedantically, I guess that means they're not really "striped" so much as they are "distributed".

I don't know if the writes to L2ARC block on a flush, but if they're not then you are still benefiting performance-wise (since a drive becomes ready for another write sooner if writes are asynchronously serialized to all disks, even if issued one-at-a-time with strict locking).

mqudsi avatar Feb 09 '22 21:02 mqudsi

That's right, the cached blocks are distributed over all of the L2ARC devices, but each individual block resides solely on a single device. When a device fails, or is removed, only the blocks on that L2ARC device are unavailable. In which case they're be read from the primary pool storage. Furthermore, if the blocks are still eligible for the L2ARC they will eventually be written out to one of the remaining online L2ARC devices.

behlendorf avatar Feb 09 '22 23:02 behlendorf

@behlendorf Wouldn't performance be much better if we allowed users to truly stripe the L2ARC cache as a singular unit, just like a striped vdev? The volatility doesn't matter anyway is it's a cache device only. This would considerably speed up SSD's that are striped together to form high-performing cache, with a lot of space to boot. Is this something that's already possible, and if not, can this be implemented?

Our server would benefit greatly from lightning-fast L2ARC cache with big files on them.

recklessnl avatar Sep 07 '22 06:09 recklessnl

@recklessnl

"each individual block resides solely on a single device"

I guess such a distribution strategy already separates the data to ensure load balance between L2ARC devices*. And I doubt that "performance be much better if we allowed users to truly stripe the L2ARC cache", unless your access pattern is to access blocks randomly one by one🤔

*let's assume they are the same model

GrayXu avatar Nov 15 '22 10:11 GrayXu