risingwave
risingwave copied to clipboard
source: support native (data chunk) format for benchmark purpose
When benchmarking with the Nexmark source (only with a count(*)
simple agg), we find that even if we increase the split number (and the parallel unit count) from 6x3 to 7x3, the throughput does not improve while the CPU usage is increased by 100%. That is, the maximal throughput we can get from the Nexmark source is 750k rows per second; this will become the bottleneck for some workloads.
The Nexmark messages are generated and then serialized to JSON and then deserialized with the source parser, which brings some overhead. It would be better if we could support the "native" format by directly passing in-memory data chunks or columns.


For example, 64% of the time of nexmark bid
generation is spent on formatting the timestamp. However, there seems no way to optimize this like pre-compile the formatter. :( cc @KeXiangWang

With "native" format, this overhead can be avoided.
related: https://github.com/singularity-data/risingwave/issues/4144
is there, equivalently, an observed high CPU consumption for deserializing string timestamp back to our in-memory representation of timestamp? (is that UTC time as u64?). If so, then there would be a problem with our code in being slow, not a problem with json serialization libraries as suggested in https://github.com/risingwavelabs/risingwave/issues/5122
is there, equivalently, an observed high CPU consumption for deserializing string timestamp back to our in-memory representation of timestamp? (is that UTC time as u64?). If so, then there would be a problem with our code in being slow, not a problem with json serialization libraries as suggested in https://github.com/risingwavelabs/risingwave/issues/5122
I've not observed deserializing one. However, if we can avoid any json ser/de in the path of nexmark (actually no need at all), we can directly pass the mills u64 and it could be much better.
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.
This feature will facilitate the performance test of streaming operators, by eliminating the cost of Source Executor. Hope it can be done.
@TennyZhuang Can you do it recently? Or reassign?
This feature will facilitate the performance test of streaming operators, by eliminating the cost of Source Executor. Hope it can be done.
To clarify, this feature is nice-to-have but not essential. We can do it as long as the complexity is controllable and does not hurt the current design of connectors.
Just wondering if any design decision has been made, we can probably assign this to others if @wangrunji0408 is currently pretty busy with both external functions and expression micro-benchmarks/optimizations.
A quick check, I can only think of doing this unsafely without changing the code structure much(but still not sure if possible):
- directly generate
StreamChunk
indatagen
generator -
unsafely
reinterpret it as[u8]
- a special parser in
ConnectorSourceReader
receives thepayload
(aka[u8]
) and reconstructs theStreamChunk
.
Hope I am wrong!
I guess the current performance after a series of optimization is good enough for micro benchmarks: as long as we have sufficient CPU cores, we can almost always make the throughput of data-gen larger than that of the downstream executor. 🤔
make the throughput of data-gen larger than that of the downstream executor.
I see, yes! it is not particularly urgent. It's useful when we try to get some peak performance numbers 🤣 for demo purposes.
create source t1 (
v1 BIGINT,
v2 BIGINT,
t1 timestamp,
t2 timestamp,
c1 varchar,
c2 varchar
) with (
connector = 'datagen',
datagen.split.num = '1', // one thread
datagen.rows.per.second = '5000000'
) ROW FORMAT JSON;
create sink s1 as select * from t1 with ( connector = 'blackhole' );
roughly 220K/s (go through json ser/de-ser) -> 310K/s (generate row
)
Hi all! Is this feature still required to meet performance requirements? If not, can I get it closed?
Hi all! Is this feature still required to meet performance requirements? If not, can I get it closed?
I think yes? cc. @lmatz
BTW, this would be easier after https://github.com/risingwavelabs/rfcs/pull/31 cc. @waruto210
Hi all! Is this feature still required to meet performance requirements? If not, can I get it closed?
I think yes? cc. @lmatz
BTW, this would be easier after risingwavelabs/rfcs#31 cc. @waruto210
We can start doing it after #7508 is merged.
done by #7621