bytes icon indicating copy to clipboard operation
bytes copied to clipboard

Sync vs Unsync versions benchmarks on MIPS and ARM

Open glebpom opened this issue 5 years ago • 8 comments

Hi,

I have done some benchmarks based on @stbuehler fork with Unsync implementation (#200). My goal was to test it on MIPS and ARM processors. Generally, sync version shows comparable or better (someteims much better) results. But there are a couple of cases where unsync version is significantly better:

MIPSel (MediaTek MT7628AN ver:1 eco:2):

Unsync test deref_two ... bench: 27,566 ns/iter (+/- 93,022) test from_long_slice ... bench: 1,770 ns/iter (+/- 15,711) = 72 MB/s

Sync test deref_two ... bench: 49,816 ns/iter (+/- 89,548) test from_long_slice ... bench: 3,700 ns/iter (+/- 9,269) = 34 MB/s

ARMv6 (ARMv6-compatible processor rev 7 (v6l))

Unsync test split_off_and_drop ... bench: 2,755,606 ns/iter (+/- 129,117)

Sync test split_off_and_drop ... bench: 3,642,042 ns/iter (+/- 59,619)

glebpom avatar Jan 21 '19 19:01 glebpom

Thanks for doing the benchmarks.

Could you clarify the action item for this issue?

carllerche avatar Jan 28 '19 18:01 carllerche

This is more like a follow-up to #200 According to this benchmark, there are some specific cases on particular hardware, where current Bytes implementation may be optimized by possible using of unsync version. Do you think it makes sense to implement separate non-atomic version? Or is there any possibility to optimize current implementation for ARMv6 or MIPS?

glebpom avatar Jan 29 '19 21:01 glebpom

A related, but more generally applicable idea is to redo BytesMut as !Sync and only put the buffer under atomic reference counting on .freeze(). The rationale is that the contents of a BytesMut are typically frequently mutated by a single thread or task accumulating input, but frozen buffers need to be handed off to processing threads with zero copying.

A consistent rework in this manner would need to slash mutating APIs on Bytes to channel people into more performant ways of forming their buffers.

mzabaluev avatar Jun 16 '19 14:06 mzabaluev

FYI, #269 has a design proposal allowing !Sync instances without code forks or even feature flags.

mzabaluev avatar Jun 25 '19 18:06 mzabaluev

@mzabaluev Looks great to me! @carllerche what do you think?

glebpom avatar Jun 26 '19 18:06 glebpom

Generally, !Sync types are pretty annoying to work with as the !Sync aspect leaks. It's better, IMO, to have types be Sync and have fns that require mutating the state to take &mut self.

carllerche avatar Jun 26 '19 18:06 carllerche

Generally, !Sync types are pretty annoying to work with as the !Sync aspect leaks.

It (as well as the propagation of !Send) can be considered a design feature, if it can be tuned in and out of generically. I had a very good experience with tower-h2 where it basically instructed me that my service implementation was unsuitable for the executor I was trying to use due to lacking Send, but it worked just fine with the current thread executor.

mzabaluev avatar Jun 27 '19 06:06 mzabaluev

I use tokio-current-thread heavily, where using !Sync is just fine. I think it makes a lot of sense to provide the optional way to not pay the unnecessary price for Sync

glebpom avatar Jun 27 '19 21:06 glebpom