Paper
Paper copied to clipboard
SectorFile Test
The goal of this PR is to gather some data on the current deficiencies of RegionFile and improve them in SectorFile. Currently, the main improvements are:
- Header corruption detection + automatic recalculation, as the header is now validated using XXHash64
- Data validation using XXHash64
- Header recalculation logic for all data types (not just block)
- ZSTD support
- Reduced sector size, which should reduce overall world size by a factor of ~1.3 up to 1.6
The compression configuration (compression-format
) in paper supports setting the value to ZSTD as well as LZ4.
Please see https://github.com/PaperMC/SectorTool for details on the new format and how to use the tool to convert the world or how to revert the conversion. The tool can be downloaded from the release tab.
Please see SPECIFICATION.md for details on the new format specification.
This patch includes conversion logic to convert RegionFiles to SectorFile on demand. If a SectorFile does not exist, it will attempt to copy chunk/entity/poi data from existing RegionFiles.
Included in this test patch is logic to dump SectorFile operation tracing to file sectorfile.tracer
in the root dir of a world. The file is not compressed, and it is appended to only. As a result of the lack of compression, when sending the file back for analysis please compress it to reduce size usage.
This tracing will be useful for later tests to perform parameter scanning on some of the parameters of SectorFile:
- The section shift
- The sector size
- SectorFile cache size
Folia branch: https://github.com/PaperMC/Folia/tree/experiment/sectorfile
Download the paperclip jar for this pull request: paper-10231.zip
Someone had mentioned that ZSTD dictionaries might help improve the compression ratio and speeds. Thought I'd mention that, maybe it's useful.
Someone had mentioned that ZSTD dictionaries might help improve the compression ratio and speeds. Thought I'd mention that, maybe it's useful.
I am (probably) that person? Anyways, I was doing tests with adding zstd compression to the vanilla anvil format. I got a ratio of ~97% when compared to zlib when using zstd -4 on its own. Then with dictionary compression the ratio drops down to ~89%. It's not a huge gain due to the padding problem, but it helps a bit. To clarify, I did not test compression speed, but as I understand at these low compression levels it should be faster than zlib even without a dictionary.
That work is featured here and uses the namespace z:4d
(meaning if you happened to want to standardize on my dictionary, the magic number to put before the compressed chunk data when using this format is is 0x00 0x04 0x7a 0x3a 0x34 0x64
, as per Mojang's suggested format for alternative compression formats under the 127
algorithm id). I intend to release a tool for conversion of region files between different compression formats, so if Paper comes up with its own set of compression standards I will adopt them.
My ratio results are not particularly relevant to SectorFile, which will use 512 byte sectors and such has potential for a lot more gains. The dictionary is not specific to entities, poi, or chunk, but rather combined based on a representative distribution of these
Any plans to update this and Folia's sectorfile branch to 1.20.6?
I've updated the branch to 1.20.6
I've updated SectorTool to v1.1:
Add copy option for conversion to SectorFile
The copy option does not perform decompression/recompression,
which should make it faster for systems limited by CPU resources.
Additionally, read the entire RegionFile into memory when converting
to SectorFile to avoid random read costs when converting.
I will be adding in code to convert RegionFiles on the fly as well.
I've added conversion logic to copy data from existing RegionFiles when a SectorFile does not exist.
When doing --forceUpgrade it crashes the server when having Unconverted RegionFiles.
When doing --forceUpgrade it crashes the server when having Unconverted RegionFiles.
forceUpgrade is not actually supported on this branch, as the code has not been modified to use SectorFile
Will this Format become Standard in Paper? Also would be good if the RegionFiles get removed when converted
Also would be good if the RegionFiles get removed when converted
I don't think so, if the conversion fails and the region files are deleted, it'd be pretty bad. Just delete them manually if you need to.
Any Plans to update this to Latest 1.20.6?
Any Plans to update this to Latest 1.20.6?
Scroll up.
I've updated SectorTool to 1.3 to fix an issue where SectorFiles are far larger than they should be when converting RegionFiles using the raw flag or when the RegionFiles were stored with the uncompressed format. This affects the auto conversion previously in this branch as well, as it converted using the raw flag.
If the original RegionFiles were stored in a compressed format, the large size can be fixed by converting from SectorFile to RegionFile and then back again. If the RegionFiles were in an uncompressed format, then you need to convert to RegionFile, then use the recreateRegionFiles startup option with forceUpgrade and then convert back to SectorFile.
SectorTool 1.3 also includes a minor disk space improvement by not storing type headers on disk if they're unused (this mostly will cause the type headers for poi and entity to be unallocated until data is written for them), which has been copied over to this branch.
Note that the branches are still on 1.20.6 as of writing.
EDIT: If it wasn't clear, I've copied the changes from SectorTool 1.3 to this branch to fix the conversion creating larger SectorFiles. There is no fix for already converted SectorFiles.