goofys
goofys copied to clipboard
Request : Can you provide an option to select a different CHUNK size on mount?
Hi,
In our company we use Activescale object storage, currently around 100PB of raw space divided in 3 different systems.
Our systems are geo-disperse over 3 datacenters, which mean that 60% of data is read locally but 40% of data will be fetched from the remote locations. This 40% of the data fetch from remote is currently too high for us due goofys 20MB range requests triggering 64MB block reads instead of just 20MB partial block reads.
Activescale had an erasure system of 64MB blocks that does not allow efficient range reads, so for every 20MB read we do it fetches 64MB and discards 44MB before delivering the requested range to the client, and since each request is not aware of the other requests it does it again for each 20MB request even if that means fetching the same 64MB chunk and dropping 44MB each time. Activescale has already fix this situation and the new data is now stored with a different erasure algorithm that allows any kind of range request to be efficient, however the already written data would need to be read and write again to benefit of such improvement.
So, it would be a huge benefit for us if someone could add a mount option that allows us to set our own default CHUNK size.
const MAX_READAHEAD = uint32(400 * 1024 * 1024) const READAHEAD_CHUNK = uint32(20 * 1024 * 1024)
The math I've done shows that without efficient range requests, 20MB chunks imply 79% of our network traffic is wasted because of fetching 64MB chunks and dropping 44MB of it. If the 20MB chunk requires 2 blocks of 64MB then (64*2)-20 is dropped.
I hope the next picture is a bit self explanatory.
So, if we could set our CHUNK size to 64MB the waste ratio will be reduce from 79% to 50% , which would be great.
Thank you very much for your understanding, and thank you very much for this piece of great software that is helping us provide PBs of data to the *omics research world.
Best regards Marc
is it possible to detect from response headers that the endpoint is Activescale, and set the chunk size automatically?