pogreb
pogreb copied to clipboard
Large database truncate problem
We are recently running into this problem which prevents the database from growing. Every time we call db.Put
we get this error message:
truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
The whole database folder is 255 GB. The file main.pix
is 37.3 GB of size. Running on Windows Server 2019 as admin and the disk has plenty of storage (4 TB total).
Any idea of the root cause and how to fix it?
I suppose the error message origins from here? https://github.com/akrylysov/pogreb/blob/e182fb02fbd270cf4943430543d6d2e3824c6682/file.go#L79-L86
Edit: Unrelated to this problem, but in truncate
used by recoveryIterator.next
it uses uint32. That could lead to problems down the road for large segment files?
https://github.com/akrylysov/pogreb/blob/e182fb02fbd270cf4943430543d6d2e3824c6682/file.go#L97-L107
Could it be file fragmentation?
Googling this message finds this: https://support.assurestor.com/support/solutions/articles/16000104076-the-requested-operation-could-not-be-completed-due-to-a-file-system-limitation
- Compressed files are more likely to reach the limit because of the way the files are stored on disk. Compressed files require more extents to describe their layout. Also, decompressing and compressing a file increases fragmentation significantly.
- The limit can be reached when write operations occur to an already compressed chunk location. The limit can also be reached by a sparse file. This size limit is usually between 40 gigabytes (GB) and 90 GB for a very fragmented file.
- A heavily fragmented file in an NTFS file system volume may not grow beyond a certain size caused by an implementation limit in structures that are used to describe the allocations.
Thanks for the bug report.
Unrelated to this problem, but in truncate used by recoveryIterator.next it uses uint32. That could lead to problems down the road for large segment files?
Segment files can't exceed 4GiB https://github.com/akrylysov/pogreb/blob/master/options.go#L40. The max segment size currently is not configurable and is always set to 4GiB.
main.pix is the main index file. Index files use 64-bit offsets: https://github.com/akrylysov/pogreb/blob/cc107cdd2f78d7ca0ec33e853c4480a9a43e7472/index.go#L89
Windows support could definitely use more testing. I develop Pogreb on macOS and deploy it to Linux.
I'll try to reproduce the issue. Wondering if it's related to mmap? I'm working on adding an option to disable mmap.
I can reproduce the error - anytime db.Put
gets called it always fails. I added debugging code and confirm that the referenced extend function fails on this line in file.go
:
if err := f.Truncate(off + int64(size)); err != nil {
I've added logging:
fmt.Printf("Error offset %d size %d from f.Trunacte: %s\n", off, size, err.Error())
And the output is always:
Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
Error offset 40108773376 size 512 from f.Trunacte: truncate D:\Database\main.pix: The requested operation could not be completed due to a file system limitation
The offset number is in sync with the file size (37.3 GB). I tried the defragmentation tool of Windows without success (I assume due to SSD it actually didn't defrag).
Then I tried another trick - copying main.pix
to a new file, deleting old one, and renaming the new one to original name. It worked! 🎉
So it looks like the underlying error is that when you extend it by 512 times NTFS extends it by millions of chunks (instead of consecutive data) - and at some point it hits an OS internal limit. I will monitor the situation and check if it fails again in 40 GB (which might take weeks).
I guess an ugly fix would be catching that error and then temporarily closing the file, and doing what I did manually - copy, delete old, rename, open.
Microsoft documented the problem here: https://support.microsoft.com/en-in/help/967351/a-heavily-fragmented-file-in-an-ntfs-volume-may-not-grow-beyond-a-cert