eos Enabling trace_api_plugin slows down replays

trafficstars

With trace_api_plugin enabled replay speed drops by two orders of magnitude and CPU and I/O utilization drops to near zero.

Same behavior seen on 2.2-rc1, 2.1 and 2.0.6

Hardware info:

Running in docker using Ubuntu 18.04, host system is a Hetzner AX41-NVME running Debian 10

Sep 02 '21 11:09 jnordberg

same here, we are using ubuntu 18.04, with trace api open, cleos get info get a response with 10 seconds

1 [||||||||||||||||||||||||100.0%] Tasks: 33, 25 thr; 4 running 2 [||||||||||||||||||||||||100.0%] Load average: 3.06 3.17 3.16 3 [||||||||||||||| 44.4%] Uptime: 12 days, 09:11:17 4 [||||||||||||||||||||| 65.6%] Mem[||||||||||||||||||||488M/7.79G]

Sep 04 '21 13:09 leoxyz123

I've tracked it down to https://github.com/EOSIO/eos/blob/master/plugins/trace_api_plugin/include/eosio/trace_api/store_provider.hpp#L71

Calling sync after every write during replays is what slows it all down to a crawl

Sep 10 '21 16:09 jnordberg

I wonder if docker is causing that to be slow. Maybe you could test iops inside your docker and see what it looks like.

Sep 10 '21 17:09 heifner

The sync would make sense as to what's causing the slowdown and docker could be adding to that.

The non-container trace apis we are running today write to ZFS volumes, so I'll bet we didn't experience this issue before since ZFS immediately acknowledges the write back to the nodeos and then queues its own sync in the background.

Sep 10 '21 17:09 aaroncox

since ZFS immediately acknowledges the write back to the nodeos and then queues its own sync in the background

This sounds frightening. A filesystem ignoring sync requests sounds broken. Is that really how ZFS operates?

Sep 10 '21 19:09 spoonincode

It can be configured to operate that way, yeah. You gain massive performance benefits since the filesystem operates using RAM as a buffer for all writes, and then queues it out to actually be written in a more intelligent manner (decent explainer here in the 2nd section about writes). You lose some level of fault tolerance if you go for the maximum performance route. This isn't a huge issue in our experiences with nodeos though, since if nodeos gets interrupted due to a power outage or hardware failure, there's a high likelihood that the state is trashed anyways requiring a replay of some kind.

Sep 10 '21 19:09 aaroncox

No other part of nodeos is calling fsync after every write from what I can tell, can this really be expected to be performant without a filesystem that can be configured to lie to applications about it?

In replay with trace_api_plugin enabled it does 2 syncs per block, testing using the script from this article my NVMe disk is getting ~700 syncs/s (same numbers on bare metal and in docker so that does not seem to be the limiting factor).

That puts the upper bound on blocks that can be processed with the plugin enabled to ~350 per second.

Sep 10 '21 23:09 jnordberg

Debug build of v2.1.0

replayed 28471 blocks in 493 seconds, 17.35102550052686965 ms/block

Debug build of v2.1.0 with .sync() call commented out

replayed 465896 blocks in 305 seconds, 0.65595844557250027 ms/block

Sep 10 '21 23:09 jnordberg

Agreed, it does seem rather surprising that trace_api_plugin tries to ensure durability here whereas neither the block log nor ship log do, I believe. But this trace_api_plugin durability seems to be very deliberate -- it smartly syncs the log before syncing the index (though I suspect it's still not safe in pessimistic scenarios because it appears the metadata_log_entry could span over multiple disk sectors). The deliberateness does make me wonder if the behavior was very much part of its intended design. I would encourage a closer inspection of the design before ripping it out. Even a seemingly safe optimization such as skipping fsyncing during catch up may be at odds with any intended durability behavior.

Sep 11 '21 04:09 spoonincode

trace_api_plugin supports multi-threaded reads from the logs. The sync, if I remember correctly, was to provide safe reading from other threads.

Sep 11 '21 04:09 heifner

eos eos copied to clipboard

Enabling trace_api_plugin slows down replays

eos
eos copied to clipboard