bee
bee copied to clipboard
Very slow to upload on /bzz with HDD
Summary
Currently upload is a very slow procedure, that can't offer a good UX for large files. It should speed up data reception, return very fast the hash reference, and take its time for sync with the net.
Motivation
Upload UX would be very improved, permitting to disconnect uploading client in the shortest time.
Implementation
Chunks are stored locally on node with a "sync pool" that works like a buffer, and the sync task will consume chunk by chunk from the pool pushing to the net. Chunks on pool should be already accessible from node, and after they are synced, they can be removed from it, and eventually processed by GC. Chunks in pool can't be garbage collected.
Drawbacks
Pool could become full of unprocessed chunks, but this is anyway a lot better than keep all clients waiting looong time for upload and disconnect.
IIRC, uploads already stage the chunks locally and push them to the swarm without holding up the original requester by default. There is a new upload parameter to deliver the chunks directly to the swarm, but the default is still to stage, respond, and then work off the pushing. Someone please correct me if I'm wrong on this.
I see, on POST /bzz API there is a parameter swarm-deferred-upload
set default at true.
Anyway, my test uploads are very very slow, I'm working with a local node connecting with Bee Dashboard. I see that he is passing on header also parameter swarm-collection: true
. Maybe Bee is slowing down trying to interpretate file like a collection? (it isn't)
I don't see any relevant log on node.
Is the node you are uploading through running on an SSD and a reasonably fast processor? Bee does need to split the file into chunks, store chose chunks, build a data structure to string the chunks together and finally put a manifest over the chunks before it can return the final content ID. All of those chunks are stored to the disk by the original uploading node and that can take some time. I don't deal with large files, so I cannot provide any timings for comparison. I upload millions (literally) of small (~64KB) files in my OSM tile set and it does take days just to push them into the local node.
The node is running on a NAS, it doesn't have an SSD, but 4 HDD in raid5, and it wasn't at full charge. These are some logs from iostat
during an upload:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 114.40 12.20 1724.00 1437.80 30.00 348.80 20.78 96.62 13.24 13.33 1.69 15.07 117.85 6.18 78.18
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 107.00 9.40 1496.80 1198.40 16.60 291.20 13.43 96.87 10.48 9.53 1.20 13.99 127.49 5.86 68.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 97.20 4.00 678.40 79.40 1.00 17.00 1.02 80.95 13.73 39.95 1.49 6.98 19.85 7.55 76.44
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 94.40 12.00 1580.80 1296.00 22.00 313.40 18.90 96.31 11.76 20.37 1.40 16.75 108.00 6.11 65.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 98.00 11.00 1501.60 1466.00 20.80 356.00 17.51 97.00 10.47 8.56 1.08 15.32 133.27 5.65 61.60
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 93.20 3.40 709.60 93.40 6.80 20.80 6.80 85.95 11.85 42.53 1.25 7.61 27.47 6.77 65.44
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 124.60 35.80 2583.20 2762.20 54.20 655.20 30.31 94.82 12.48 21.67 2.35 20.73 77.16 4.56 73.10
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 121.20 20.20 2863.20 3006.60 47.60 732.80 28.20 97.32 9.91 13.73 1.48 23.62 148.84 4.98 70.42
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 87.80 4.40 667.20 85.20 0.80 18.00 0.90 80.36 10.99 43.95 1.17 7.60 19.36 6.80 62.74
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 155.80 29.20 4024.80 4371.00 83.20 1064.00 34.81 97.33 11.15 10.98 2.05 25.83 149.69 4.36 80.68
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 157.40 37.60 4844.00 5562.80 89.40 1353.40 36.22 97.30 10.67 3.70 1.81 30.78 147.95 4.03 78.62
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 121.60 13.20 1817.60 1611.80 31.00 390.00 20.31 96.73 10.72 9.39 1.44 14.95 122.11 5.26 70.84
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 103.40 13.80 1690.40 1457.00 30.00 351.80 22.49 96.23 12.00 11.38 1.40 16.35 105.58 5.99 70.24
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 162.20 55.40 3812.00 4252.40 90.00 1008.20 35.69 94.79 14.99 11.56 3.07 23.50 76.76 4.06 88.40
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 173.00 40.00 4872.00 5595.40 86.60 1359.00 33.36 97.14 11.68 4.99 2.24 28.16 139.88 4.04 86.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 151.40 33.60 4724.00 5291.40 107.20 1290.00 41.45 97.46 10.50 5.93 1.81 31.20 157.48 4.28 79.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 170.60 36.60 4931.20 5733.00 90.80 1396.80 34.74 97.45 11.65 2.52 2.06 28.91 156.64 4.00 82.84
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 188.80 32.60 4610.40 4875.60 84.40 1186.40 30.89 97.33 12.18 5.18 2.46 24.42 149.56 4.24 93.78
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 182.80 30.00 3930.40 4495.80 71.00 1094.20 27.97 97.33 13.26 16.17 2.92 21.50 149.86 4.35 92.58
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 181.80 82.20 3586.40 3874.80 68.80 886.20 27.45 91.51 18.81 17.11 4.83 19.73 47.14 3.39 89.52
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 131.20 25.40 1929.60 2651.00 43.40 638.20 24.86 96.17 13.14 9.80 1.99 14.71 104.37 4.68 73.32
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 108.20 8.20 832.80 243.20 3.40 53.80 3.05 86.77 13.52 21.98 1.64 7.70 29.66 6.22 72.42
Processor is a quad-core Intel Pentium N3710, 16GB of memory. It doesn't appear to be full, even if actually I can see a bottleneck on clef, that is not able to scale over a singole core. Process /usr/local/bin/bee-clef
is always around 25% of cpu load.
I've tried to load a file of 72.991.431 bytes, and it kept 1h52m40s, for a speed of ~10kB/s of upload. Moreover, process ends with a 500 error, and node shutted down kademlia.
I will try to load on a node with ssd, anyway I'm sure that we can do better. For example, if clef is so slow to process (I'm using keys generated with --lightkdf
option), don't try to sign everything during upload, put data in pool and sign chunks with the sync process. This is possible, because same file generate the same hash, even if uploaded with different postage batches, so stamp signing is not required for generate hash.
I suspect you'll find the response better with a node using an SSD. Notice your disk utilization numbers (72-90%). When you take into account that swarm is likely doing computation then disk access, the disk is probably slowing you down more than you realize.
The sync (pusher) process is occurring concurrently with your upload, so as soon as the node has some chunks ready to go, it starts pushing them. I don't know for sure that bee doesn't use clef to sign anything during the initial chunk storage into the pending push queue, but I know that clef gets busy with signing activities while the pusher is running.
I create a tag and attach it to my uploads so that I can observe the chunking (processed) and syncing (synced) behavior in realtime while the upload(s) are running and the pusher is active. You can learn a lot from this after you study it for a few weeks.
I will try to use SSD, but an upload speed of ~10kB/s is a no-sense, in any case. Clef have to scale on more cores, and this is a point (I will study more in deep), but I see an average disk load at ~70%. Trying to load again another file, I see now an average load of ~20%, so definitely not this. New logs:
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 27.00 6.00 275.20 243.40 10.80 56.20 28.57 90.35 7.90 6.57 0.25 10.19 40.57 6.62 21.86
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 26.40 14.00 216.80 66.60 0.40 4.00 1.49 22.22 11.32 15.94 0.52 8.21 4.76 5.51 22.26
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 20.80 5.40 190.40 24.20 4.40 2.00 17.46 27.03 7.66 8.85 0.21 9.15 4.48 6.11 16.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 55.40 4.00 327.20 65.00 0.80 12.80 1.42 76.19 6.72 13.45 0.43 5.91 16.25 5.37 31.88
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 48.80 13.80 766.40 1274.60 19.60 306.20 28.65 95.69 5.99 4.97 0.36 15.70 92.36 3.80 23.78
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 27.40 2.40 217.60 79.80 0.00 17.80 0.00 88.12 6.68 8.25 0.20 7.94 33.25 6.23 18.58
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 26.00 5.40 183.20 134.80 0.40 29.60 1.52 84.57 7.76 10.67 0.26 7.05 24.96 6.36 19.98
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 18.00 3.20 174.40 10.80 2.60 0.80 12.62 20.00 8.93 15.81 0.21 9.69 3.38 7.81 16.56
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 21.20 17.80 149.60 86.40 3.80 5.60 15.20 23.93 10.38 13.60 0.46 7.06 4.85 5.03 19.62
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 16.00 2.20 126.40 5.20 0.80 0.00 4.76 0.00 9.70 25.18 0.21 7.90 2.36 8.34 15.18
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 33.40 4.40 210.40 39.40 0.00 6.60 0.00 60.00 9.15 20.77 0.40 6.30 8.95 7.34 27.74
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 62.00 29.00 1219.20 1352.40 43.60 309.40 41.29 91.43 4.70 1.72 0.34 19.66 46.63 2.80 25.48
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 46.60 17.40 707.20 777.80 24.80 177.80 34.73 91.09 5.51 2.51 0.30 15.18 44.70 3.62 23.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 32.80 4.00 266.40 31.80 0.00 5.60 0.00 58.33 8.15 8.35 0.30 8.12 7.95 7.18 26.44
I've also tried to push data with samba protocol while I'm also loading with Bee, and samba is able to push at 80MB/s, so disk is not the bottleneck. Still seeing clef at 25% on cpu.
Thanks for opening this @tmm360.
Please bear in mind that using clef
will significantly slow down uploads. It slows down our integration tests by a significant margin. Have a look here and here. These two runs are on the same PR. Have a look at the settlements
test execution time. With clef it takes 5m33s
and without 4m10s
(25% difference). The test is not necessarily deterministic in the amount of time it needs to wait for certain conditions to be met so this is not really an exact number either. But you get the gist.
Clef does a lot of work when signing so on large uploads the performance impact will be noticeable.
Also, worth to note that we aren't taking spinning discs in mind right now. Storage on spinning discs is a complete paradigm shift and needs to be structured in a very specific way in order to leverage their properties in a way that keeps performance reasonable. Right now we do not operate under the assumptions that users will use spinning discs (in other words our storage abstractions aren't built to cater for this use-case). I also suspect that leveldb (which we use under-the-hood) will perform very badly on uploads due to constant sorting of the pages. A random HDD datasheet for a brand-new drive shows that random access time on average is 4.16ms(!!!). Leveldb inserts will almost never be sequential (except maybe for when the db is nearly empty (you get the point)...
general recommendation is to use SSDs we will define more detailed hardware requirements in the near future.