fst icon indicating copy to clipboard operation
fst copied to clipboard

Very slow writing to network drive when using compression (Windows 7)

Open MehMog opened this issue 8 years ago • 3 comments

When saving to a network drive without compression I get about 50Mbps and with compression at 100 I get 8Mbps. Saving to a local drive however is much much faster so the CPU doesn't seem to be the bottleneck.

MehMog avatar Feb 12 '17 18:02 MehMog

Hi @MehranMoghtadai , the fst package writes data in relatively small blocks of around 32 kB each. That means that at a speed of 8 Mbps, it's sending at least 30 distinct writes to the network drive per second and using at least as many random seeks within the file (depending on the data types of columns). With modern SSD's this hardly creates overhead because they can handle very large amounts of random access writes (and reads). But with a network drive (or a conventional hard drive) random reads and writes are expensive. It's hard to say anything specific on your situation, because it depends on the setup of your network drive. I tested the package on several network drives and usually see that the drive's speed is the bottleneck, so using maximum compression helps a lot for speed in those cases. But perhaps with your network drive the large number of (random) writes is the bottleneck and the reason for the degraded performance? If that is the case, increasing the block size might help, but it would introduce a larger penalty for providing random access to the data (the small block size enables efficient random row-access).

MarcusKlik avatar Feb 12 '17 21:02 MarcusKlik

Although the fst package was designed with high speed IO in mind, I believe a slower 'high compression' mode would be very beneficial to your use case. In that mode, we can use larger blocks to compress (using 512 kB instead of 32 kB will probably solve your speed problem) but still keep random row- and column access. Access would be a little less random than in default mode, but for slow networks, the ultimate data-rate is the limiting factor anyway, so that would not be a large problem. Thanks for reporting your specific problem, I will make sure that a high-compression mode is added to one of the next releases of fst.

MarcusKlik avatar Feb 12 '17 22:02 MarcusKlik

@MarcusKlik Thanks for the explanation! That definitely makes sense, I believe it must be the large number of random writes that are creating the issue indeed. To be honest I have no idea what are network drives actually are, but typically copying large data from/to the network drive saturates our 100Mbps lines (It's a bit sad that we're only using 100Mbps lines still but that's a whole other issue), I believe windows by default uses 64kb blocks but I'm not 100% sure on that. So a larger block size should indeed help. I haven't done extensive tests with regards to this on my side though.

But thank you for actually implementing a solution for this! Excited to make use of this package even more!

MehMog avatar Feb 13 '17 11:02 MehMog