bee Very slow to upload on /bzz with HDD

Summary

Currently upload is a very slow procedure, that can't offer a good UX for large files. It should speed up data reception, return very fast the hash reference, and take its time for sync with the net.

Motivation

Upload UX would be very improved, permitting to disconnect uploading client in the shortest time.

Implementation

Chunks are stored locally on node with a "sync pool" that works like a buffer, and the sync task will consume chunk by chunk from the pool pushing to the net. Chunks on pool should be already accessible from node, and after they are synced, they can be removed from it, and eventually processed by GC. Chunks in pool can't be garbage collected.

Drawbacks

Pool could become full of unprocessed chunks, but this is anyway a lot better than keep all clients waiting looong time for upload and disconnect.

Feb 17 '22 00:02 tmm360

IIRC, uploads already stage the chunks locally and push them to the swarm without holding up the original requester by default. There is a new upload parameter to deliver the chunks directly to the swarm, but the default is still to stage, respond, and then work off the pushing. Someone please correct me if I'm wrong on this.

Feb 17 '22 01:02 ldeffenb

I see, on POST /bzz API there is a parameter swarm-deferred-upload set default at true. Anyway, my test uploads are very very slow, I'm working with a local node connecting with Bee Dashboard. I see that he is passing on header also parameter swarm-collection: true. Maybe Bee is slowing down trying to interpretate file like a collection? (it isn't) I don't see any relevant log on node.

Feb 17 '22 02:02 tmm360

Is the node you are uploading through running on an SSD and a reasonably fast processor? Bee does need to split the file into chunks, store chose chunks, build a data structure to string the chunks together and finally put a manifest over the chunks before it can return the final content ID. All of those chunks are stored to the disk by the original uploading node and that can take some time. I don't deal with large files, so I cannot provide any timings for comparison. I upload millions (literally) of small (~64KB) files in my OSM tile set and it does take days just to push them into the local node.

Feb 17 '22 03:02 ldeffenb

The node is running on a NAS, it doesn't have an SSD, but 4 HDD in raid5, and it wasn't at full charge. These are some logs from iostat during an upload:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            114.40   12.20   1724.00   1437.80    30.00   348.80  20.78  96.62   13.24   13.33   1.69    15.07   117.85   6.18  78.18

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            107.00    9.40   1496.80   1198.40    16.60   291.20  13.43  96.87   10.48    9.53   1.20    13.99   127.49   5.86  68.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             97.20    4.00    678.40     79.40     1.00    17.00   1.02  80.95   13.73   39.95   1.49     6.98    19.85   7.55  76.44

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             94.40   12.00   1580.80   1296.00    22.00   313.40  18.90  96.31   11.76   20.37   1.40    16.75   108.00   6.11  65.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             98.00   11.00   1501.60   1466.00    20.80   356.00  17.51  97.00   10.47    8.56   1.08    15.32   133.27   5.65  61.60

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             93.20    3.40    709.60     93.40     6.80    20.80   6.80  85.95   11.85   42.53   1.25     7.61    27.47   6.77  65.44

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            124.60   35.80   2583.20   2762.20    54.20   655.20  30.31  94.82   12.48   21.67   2.35    20.73    77.16   4.56  73.10

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            121.20   20.20   2863.20   3006.60    47.60   732.80  28.20  97.32    9.91   13.73   1.48    23.62   148.84   4.98  70.42

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             87.80    4.40    667.20     85.20     0.80    18.00   0.90  80.36   10.99   43.95   1.17     7.60    19.36   6.80  62.74

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            155.80   29.20   4024.80   4371.00    83.20  1064.00  34.81  97.33   11.15   10.98   2.05    25.83   149.69   4.36  80.68

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            157.40   37.60   4844.00   5562.80    89.40  1353.40  36.22  97.30   10.67    3.70   1.81    30.78   147.95   4.03  78.62

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            121.60   13.20   1817.60   1611.80    31.00   390.00  20.31  96.73   10.72    9.39   1.44    14.95   122.11   5.26  70.84

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            103.40   13.80   1690.40   1457.00    30.00   351.80  22.49  96.23   12.00   11.38   1.40    16.35   105.58   5.99  70.24

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            162.20   55.40   3812.00   4252.40    90.00  1008.20  35.69  94.79   14.99   11.56   3.07    23.50    76.76   4.06  88.40

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            173.00   40.00   4872.00   5595.40    86.60  1359.00  33.36  97.14   11.68    4.99   2.24    28.16   139.88   4.04  86.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            151.40   33.60   4724.00   5291.40   107.20  1290.00  41.45  97.46   10.50    5.93   1.81    31.20   157.48   4.28  79.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            170.60   36.60   4931.20   5733.00    90.80  1396.80  34.74  97.45   11.65    2.52   2.06    28.91   156.64   4.00  82.84

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            188.80   32.60   4610.40   4875.60    84.40  1186.40  30.89  97.33   12.18    5.18   2.46    24.42   149.56   4.24  93.78

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            182.80   30.00   3930.40   4495.80    71.00  1094.20  27.97  97.33   13.26   16.17   2.92    21.50   149.86   4.35  92.58

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            181.80   82.20   3586.40   3874.80    68.80   886.20  27.45  91.51   18.81   17.11   4.83    19.73    47.14   3.39  89.52

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            131.20   25.40   1929.60   2651.00    43.40   638.20  24.86  96.17   13.14    9.80   1.99    14.71   104.37   4.68  73.32

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda            108.20    8.20    832.80    243.20     3.40    53.80   3.05  86.77   13.52   21.98   1.64     7.70    29.66   6.22  72.42

Processor is a quad-core Intel Pentium N3710, 16GB of memory. It doesn't appear to be full, even if actually I can see a bottleneck on clef, that is not able to scale over a singole core. Process /usr/local/bin/bee-clef is always around 25% of cpu load.

I've tried to load a file of 72.991.431 bytes, and it kept 1h52m40s, for a speed of ~10kB/s of upload. Moreover, process ends with a 500 error, and node shutted down kademlia.

I will try to load on a node with ssd, anyway I'm sure that we can do better. For example, if clef is so slow to process (I'm using keys generated with --lightkdf option), don't try to sign everything during upload, put data in pool and sign chunks with the sync process. This is possible, because same file generate the same hash, even if uploaded with different postage batches, so stamp signing is not required for generate hash.

Feb 17 '22 14:02 tmm360

I suspect you'll find the response better with a node using an SSD. Notice your disk utilization numbers (72-90%). When you take into account that swarm is likely doing computation then disk access, the disk is probably slowing you down more than you realize.

The sync (pusher) process is occurring concurrently with your upload, so as soon as the node has some chunks ready to go, it starts pushing them. I don't know for sure that bee doesn't use clef to sign anything during the initial chunk storage into the pending push queue, but I know that clef gets busy with signing activities while the pusher is running.

I create a tag and attach it to my uploads so that I can observe the chunking (processed) and syncing (synced) behavior in realtime while the upload(s) are running and the pusher is active. You can learn a lot from this after you study it for a few weeks.

Feb 17 '22 14:02 ldeffenb

I will try to use SSD, but an upload speed of ~10kB/s is a no-sense, in any case. Clef have to scale on more cores, and this is a point (I will study more in deep), but I see an average disk load at ~70%. Trying to load again another file, I see now an average load of ~20%, so definitely not this. New logs:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             27.00    6.00    275.20    243.40    10.80    56.20  28.57  90.35    7.90    6.57   0.25    10.19    40.57   6.62  21.86

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             26.40   14.00    216.80     66.60     0.40     4.00   1.49  22.22   11.32   15.94   0.52     8.21     4.76   5.51  22.26

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             20.80    5.40    190.40     24.20     4.40     2.00  17.46  27.03    7.66    8.85   0.21     9.15     4.48   6.11  16.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             55.40    4.00    327.20     65.00     0.80    12.80   1.42  76.19    6.72   13.45   0.43     5.91    16.25   5.37  31.88

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             48.80   13.80    766.40   1274.60    19.60   306.20  28.65  95.69    5.99    4.97   0.36    15.70    92.36   3.80  23.78

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             27.40    2.40    217.60     79.80     0.00    17.80   0.00  88.12    6.68    8.25   0.20     7.94    33.25   6.23  18.58

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             26.00    5.40    183.20    134.80     0.40    29.60   1.52  84.57    7.76   10.67   0.26     7.05    24.96   6.36  19.98

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             18.00    3.20    174.40     10.80     2.60     0.80  12.62  20.00    8.93   15.81   0.21     9.69     3.38   7.81  16.56

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             21.20   17.80    149.60     86.40     3.80     5.60  15.20  23.93   10.38   13.60   0.46     7.06     4.85   5.03  19.62

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             16.00    2.20    126.40      5.20     0.80     0.00   4.76   0.00    9.70   25.18   0.21     7.90     2.36   8.34  15.18

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             33.40    4.40    210.40     39.40     0.00     6.60   0.00  60.00    9.15   20.77   0.40     6.30     8.95   7.34  27.74

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             62.00   29.00   1219.20   1352.40    43.60   309.40  41.29  91.43    4.70    1.72   0.34    19.66    46.63   2.80  25.48

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             46.60   17.40    707.20    777.80    24.80   177.80  34.73  91.09    5.51    2.51   0.30    15.18    44.70   3.62  23.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             32.80    4.00    266.40     31.80     0.00     5.60   0.00  58.33    8.15    8.35   0.30     8.12     7.95   7.18  26.44

I've also tried to push data with samba protocol while I'm also loading with Bee, and samba is able to push at 80MB/s, so disk is not the bottleneck. Still seeing clef at 25% on cpu.

Feb 17 '22 15:02 tmm360

Thanks for opening this @tmm360. Please bear in mind that using clef will significantly slow down uploads. It slows down our integration tests by a significant margin. Have a look here and here. These two runs are on the same PR. Have a look at the settlements test execution time. With clef it takes 5m33s and without 4m10s (25% difference). The test is not necessarily deterministic in the amount of time it needs to wait for certain conditions to be met so this is not really an exact number either. But you get the gist.

Clef does a lot of work when signing so on large uploads the performance impact will be noticeable.

Also, worth to note that we aren't taking spinning discs in mind right now. Storage on spinning discs is a complete paradigm shift and needs to be structured in a very specific way in order to leverage their properties in a way that keeps performance reasonable. Right now we do not operate under the assumptions that users will use spinning discs (in other words our storage abstractions aren't built to cater for this use-case). I also suspect that leveldb (which we use under-the-hood) will perform very badly on uploads due to constant sorting of the pages. A random HDD datasheet for a brand-new drive shows that random access time on average is 4.16ms(!!!). Leveldb inserts will almost never be sequential (except maybe for when the db is nearly empty (you get the point)...

Feb 22 '22 22:02 acud

general recommendation is to use SSDs we will define more detailed hardware requirements in the near future.

May 24 '23 10:05 istae

bee bee copied to clipboard

Very slow to upload on /bzz with HDD

Summary

Motivation

Implementation

Drawbacks

bee
bee copied to clipboard