zenfs icon indicating copy to clipboard operation
zenfs copied to clipboard

How to test the active zone limit?

Open UNKJay opened this issue 2 years ago • 9 comments

I've read <ZNS: Avoiding the Block Interface Tax for Flash-based SSDs>. It said "RocksDB can work with as few as 6 active zones with restricted write performance, while more than 12 active zones does not add any significant performance benefits.", but I can't get the same result in my emulator, can I get some help in db_bench script?

UNKJay avatar Jul 19 '22 09:07 UNKJay

@UNKJay Can you please share more details on your test ? Like How are you emulating a ZNS device ? What are the discrepancies that you are seeing ? The point in the paper was due to the fact that physical ZNS SSDs have a limit on the max active number of zones in any given moment. So writing up to 12 zones in parallel was able to meet the needs of rocksdb writes and opening further zones in parallel was not needed. That said, max_background_jobs parameter should help you to control/tune the active zones used by rocksdb. Apart from that can you please specify what help you need ?

aravind-wdc avatar Jul 19 '22 10:07 aravind-wdc

@aravind-wdc I use FEMU to emulate a ZNS SSD. I've found FEMU will receive a few write request with the same logical address, which means in-place update? Will ZenFS cause in-place update?

UNKJay avatar Jul 20 '22 08:07 UNKJay

@UNKJay Thanks for the update. ZenFS always writes at write pointer, so I don't think zenfs is doing in-place updates. It could be out of order writes as well reaching the drive. Have you checked if the scheduler is mq-deadline ? Schedulers other than mq-deadline can cause out of order writes to happen.

aravind-wdc avatar Jul 20 '22 14:07 aravind-wdc

@aravind-wdc Thanks for the answer. The scheduler is mq-deadline and I have found a problem. Everytime the journal zone persisted, it's a small-scale write, however, the smallest writing scale in SSD is about 4K (page size), so this is a dismatch. Do you have some suggestions to fix the problem?

UNKJay avatar Jul 21 '22 08:07 UNKJay

@UNKJay writes have to be block size aligned. iirc, for buffered writes (for write ahead log) zenfs pads the write buffer to match block size. Are you running a modified(in code) zenfs ? I am not sure what the problem exactly is

aravind-wdc avatar Jul 21 '22 10:07 aravind-wdc

@aravind-wdc Can I configure the buffer write aligned to more larger size?

UNKJay avatar Jul 22 '22 07:07 UNKJay

@UNKJay: Are you emulating a device with a > 4k block size?

yhr avatar Aug 01 '22 13:08 yhr

@yhr No, I've change the LBA to 4K to align the flash page size, but I still can't find the difference between different active zone limit. How can I get the frequency of compaction? I think maybe the testcase cannot trigger enough compaction operations.

UNKJay avatar Aug 01 '22 13:08 UNKJay

@UNKJay : The rocksdb LOG file periodically outputs the compaction statistics, check that. If no compaction has been done, make sure you are running large enough tests and that the workload contains overwrites.The log file is stored in the aux directory specified during zenfs mkfs.

yhr avatar Aug 02 '22 12:08 yhr

@UNKJay : Did my answer make sense? Can we close this issue?

yhr avatar Aug 19 '22 10:08 yhr