bee icon indicating copy to clipboard operation
bee copied to clipboard

Stamperstore ldb writes produce high disk write

Open Cafe137 opened this issue 1 year ago • 6 comments

Context

Windows 10 AMD64, Bee 2.2.0

Summary

I am writing 1,000,000 chunks at the rate of 200,000 per hour (mantaray with 600,000 forks). This use case produces high disk write that may have bad side effects such as degraded performance or an impact on SSD lifetime. Based on the screenshot number 2 below, the culprit seems to be the stamperstore writes. Network bandwidth is negligible compared to the disk write bandwidth.

Capture

Capture2

Expected behavior

I have no knowledge on what exactly happens under the hood. My naive assumption is that for each stamping, the corresponding ~2MB ldb gets overwritten, but I may be completely off here.

In case of write-heavy operations like this, in-memory caching with periodic persists to the disk (or WAL, to not lose stamp counters in case of a crash) would likely improve the situation. (stamperstore ≤ 500MB on my machine™)

Actual behavior

Please see screenshots above. Disk write frequently surpasses 100MB/s.

Steps to reproduce

On demand I can create a small JS project that creates and writes many chunks in an endless loop.

Possible solution

Please see expected behaviour.

Cafe137 avatar Oct 30 '24 18:10 Cafe137

Can you describe what the upload sizes are? and which api endpoint are you using?

istae avatar Nov 27 '24 23:11 istae

Can you describe what the upload sizes are? and which api endpoint are you using?

I was uploading the wikipedia, and at this point I was writing the mantaray forks. Many small /bytes endpoint uploads. Probably minimal time spent splitting/chunking, and most time spent stamping, which turned out to have an I/O bottleneck.

Sorry, I know this isn't much, I will eventually get back to larger data uploads where I can investigate this better.

Cafe137 avatar Dec 11 '24 17:12 Cafe137

@istae I have an update on this

I used the default Activity Monitor of MacOS this time, not Windows.

Version: 2.4.0-390a402e
Mode: light

I restarted my Bee node to clear Disk IO statistics.

I used 10MB for input 10mb.bin:

dd if=/dev/urandom of=10mb.bin bs=10000000 count=1

This is roughly ~2463 chunks that I uploaded using POST /chunks.

Chunks uploaded Bytes Written
0 8,5 MB
174 69,5 MB
501 205,1 MB
754 305,3 MB
1001 409,7 MB
1260 510,2 MB
1510 622,4 MB
2006 831,1 MB
2420 1,00 GB
2463 1,03 GB

This looks off to me, way too much write for an upload of 10MB.

Is this helpful to you? Are you interested in the chunker/uploader app?

Cafe137 avatar Jan 23 '25 14:01 Cafe137

Attaching fs_usage output that I attached to the running bee process.

grepped.txt

Cafe137 avatar Jan 23 '25 15:01 Cafe137

@Cafe137 can you help me to reproduce this? I've tried uploading the 10mb file through swarm-cli (on linux) and I cannot reproduce such numbers. For a 10mb file I get 13mb of IO in total writes (including background noise from the protocol on the testnet)

acud avatar Nov 25 '25 19:11 acud

So a few pointers after digging around our codebase and leveldb codebase:

  • I ruled out that maybe MacOS fs_usage is reporting different numbers than real (for example, the kernel changing some of the numbers due to some driver/block number alignment). At least GenAI says the numbers are the logical requested bytes to write the disk. This led me to think the following below (also ruled out):
  • I ruled out leveldb preallocating the journal files (or at least couldn't find such references) under the assumption that they would be relatively with little content and due to fast sequenced writes
  • Finally, I realized that the following happens (which is what I suppose currently that explains the observed problem):

In api.go:


func (s *Service) newStamperPutter(ctx context.Context, opts putterOptions) (storer.PutterSession, error) {
	if !opts.Deferred && s.beeMode == DevMode {
		return nil, errUnsupportedDevNodeOperation
	}

	stamper, save, err := s.getStamper(opts.BatchID)
	if err != nil {
		return nil, fmt.Errorf("get stamper: %w", err)
	}
...

This gets called on every upload, whether of one chunk or one file. But it is called once per upload. This gives us the stamper to stamp the individual chunks that get split. The problem is in the save callback (the middle return arg that gets returned from GetStampIssuer) that gets returned from pkg/postage/service.go:

// GetStampIssuer finds a stamp issuer by batch ID.
func (ps *service) GetStampIssuer(batchID []byte) (*StampIssuer, func() error, error) {
	ps.mtx.Lock()
	defer ps.mtx.Unlock()

	for _, st := range ps.issuers {
		if bytes.Equal(batchID, st.data.BatchID) {
			if !ps.IssuerUsable(st) {
				return nil, nil, ErrNotUsable
			}
			return st, func() error {
				ps.mtx.Lock()
				defer ps.mtx.Unlock()
				return ps.save(st)
			}, nil
		}
	}
	return nil, nil, ErrNotFound
}

// save persists the specified stamp issuer to the stamperstore.
func (ps *service) save(st *StampIssuer) error {
	st.mtx.Lock()
	defer st.mtx.Unlock()

	if err := ps.store.Put(&StampIssuerItem{
		Issuer: st,
	}); err != nil {
		return err
	}
	return nil
}

So what happens is that the StampIssuer gets persisted after every request terminates. The problem must be that stamp issuer buckets get serialized every time, and that's a lot of data relatively. So I am quite certain that this is what causes the problem.

To solve this won't be so easy. It would either mean we must do some write coalescing on the API before flushing to disk, or finding different ways of storing the buckets in a way that won't incur so many writes every time we need to upload a single chunk.

It is however, a large effort to achieve this. Ideally we could build tooling that would make uploading a lot of small bits of information more efficient in batch, such as through a tarball (this already exists in the API).

acud avatar Nov 26 '25 21:11 acud