feat(filestore): add mmap reader option
Motivation
I have a custom IPFS implementation based on boxo, which uses FileManager on Windows because I need to store files directly rather than file blocks. The pull logic is roughly as follows:
- Pull the entire
ProtoNodetree. - Create a sparse file, which can be larger than 100 GB.
- Randomly pull
RawNodefile blocks and write them to the system. Meanwhile, this Windows node also communicates and synchronizes with other nodes.
When hundreds of Windows nodes are working simultaneously, I noticed that the system's memory consumption becomes very high, and the memory is immediately released once the file download is complete. Interestingly, the memory consumption is not by the program itself but by the operating system. After some investigation, I found the root cause: Windows caches the read file blocks (even if the *os.File is already closed).
This can be reproduced with the following code:
package main
import (
"crypto/rand"
"fmt"
"io"
"os"
"runtime"
"sync"
"time"
"unsafe"
"sparse/ratelimit"
"golang.org/x/sys/windows"
)
func readBuf(path string, idx int, size int) []byte {
f, err := os.Open(path)
if err != nil {
panic(err)
}
defer f.Close()
buf := make([]byte, size)
if _, err := f.ReadAt(buf, int64(idx)); err != nil {
panic(err)
}
return buf
}
type FILE_SET_SPARSE_BUFFER struct {
SetSparse bool
}
// SetSparse makes the file be a sparse file
func SetSparse(out *os.File, open bool) error {
lpInBuffer := FILE_SET_SPARSE_BUFFER{
SetSparse: open,
}
var bytesReturned uint32
err := windows.DeviceIoControl(
windows.Handle(out.Fd()),
windows.FSCTL_SET_SPARSE,
(*byte)(unsafe.Pointer(&lpInBuffer)),
uint32(unsafe.Sizeof(lpInBuffer)),
nil, 0, &bytesReturned, nil,
)
if err != nil {
return fmt.Errorf("DeviceIoControl FSCTL_SET_SPARSE: %w", err)
}
return nil
}
func main() {
filename := "test.bin"
size := 1024 * 1024 * 1024 * 100 // 100GB
defer os.Remove(filename)
f, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0o644)
if err != nil {
panic(err)
}
defer f.Close()
if err := SetSparse(f, true); err != nil {
panic(err)
}
if err := f.Truncate(int64(size)); err != nil {
panic(err)
}
chunkSize := 256 * 1024
offset := make([]int, size/chunkSize)
for i := 0; i < int(size); i += chunkSize {
offset = append(offset, i)
}
mrand.Shuffle(len(offset), func(i, j int) {
offset[i], offset[j] = offset[j], offset[i]
})
bucket := ratelimit.NewFromMbps(200)
ch := make(chan int)
cache := map[int]struct{}{}
rwMux := sync.RWMutex{}
numCpu := runtime.GOMAXPROCS(0)
println(numCpu)
rf, err := os.Open(filename)
if err != nil {
panic(err)
}
defer rf.Close()
for i := 0; i < numCpu; i++ {
go func() {
for {
rwMux.RLock()
var key int
for k := range cache {
key = k
break
}
rwMux.RUnlock()
readBuf(filename, key, chunkSize)
time.Sleep(time.Millisecond * 1)
}
}()
}
wg := sync.WaitGroup{}
for i := 0; i < numCpu; i++ {
wg.Add(1)
go func() {
defer wg.Done()
buf := make([]byte, chunkSize)
for idx := range ch {
io.ReadFull(rand.Reader, buf)
bucket.Wait(int64(chunkSize))
if _, err := f.WriteAt(buf, int64(idx)); err != nil {
panic(err)
}
// if err := f.Sync(); err != nil {
// panic(err)
// }
rwMux.Lock()
cache[idx] = struct{}{}
rwMux.Unlock()
}
}()
}
for _, start := range offset {
ch <- start
}
wg.Wait()
}
Solution
Using mmap can solve this problem (implemented with CreateFileMapping on Windows). To maintain backward compatibility, a WithMMapReader option has been added to FileManager. Enabling this option on Windows can prevent excessive memory consumption.
The downside is that it relies on the exp package, but it only changes from an indirect dependency to a direct dependency. Alternatively, the code from this package can be copied directly into the project.
Codecov Report
Attention: Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.
Project coverage is 60.42%. Comparing base (
6c7f2b7) to head (56a2d3c). Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| filestore/filereader.go | 70.00% | 4 Missing and 2 partials :warning: |
@@ Coverage Diff @@
## main #665 +/- ##
=======================================
Coverage 60.41% 60.42%
=======================================
Files 244 245 +1
Lines 31027 31056 +29
=======================================
+ Hits 18746 18765 +19
- Misses 10615 10620 +5
- Partials 1666 1671 +5
| Files with missing lines | Coverage Δ | |
|---|---|---|
| filestore/fsrefstore.go | 41.90% <100.00%> (+4.09%) |
:arrow_up: |
| filestore/filereader.go | 70.00% <70.00%> (ø) |
@lidel Yes, mmap was introduced to solve the Windows-specific problem, but WithMMapReader is an option-in and the default behavior is still to use go std's os.File, so I implemented WithMMapReader to let the user choose whether to use mmap or not, I don't think a build tag is needed here, mmap is worked well to all major platforms.
done
@lidel Hi, should I do anything else?
@lidel Hi, Is there anything else I need to do? I have to handle conflicts due to frequent commits from upstream