Paper icon indicating copy to clipboard operation
Paper copied to clipboard

SectorFile Test

Open Spottedleaf opened this issue 1 year ago • 13 comments

The goal of this PR is to gather some data on the current deficiencies of RegionFile and improve them in SectorFile. Currently, the main improvements are:

  1. Header corruption detection + automatic recalculation, as the header is now validated using XXHash64
  2. Data validation using XXHash64
  3. Header recalculation logic for all data types (not just block)
  4. ZSTD support
  5. Reduced sector size, which should reduce overall world size by a factor of ~1.3 up to 1.6

The compression configuration (compression-format) in paper supports setting the value to ZSTD as well as LZ4.

Please see https://github.com/PaperMC/SectorTool for details on the new format and how to use the tool to convert the world or how to revert the conversion. The tool can be downloaded from the release tab.

Please see SPECIFICATION.md for details on the new format specification.

This patch includes conversion logic to convert RegionFiles to SectorFile on demand. If a SectorFile does not exist, it will attempt to copy chunk/entity/poi data from existing RegionFiles.

Included in this test patch is logic to dump SectorFile operation tracing to file sectorfile.tracer in the root dir of a world. The file is not compressed, and it is appended to only. As a result of the lack of compression, when sending the file back for analysis please compress it to reduce size usage.

This tracing will be useful for later tests to perform parameter scanning on some of the parameters of SectorFile:

  1. The section shift
  2. The sector size
  3. SectorFile cache size

Folia branch: https://github.com/PaperMC/Folia/tree/experiment/sectorfile


Download the paperclip jar for this pull request: paper-10231.zip

Spottedleaf avatar Feb 08 '24 17:02 Spottedleaf

Someone had mentioned that ZSTD dictionaries might help improve the compression ratio and speeds. Thought I'd mention that, maybe it's useful.

RealTriassic avatar Feb 11 '24 00:02 RealTriassic

Someone had mentioned that ZSTD dictionaries might help improve the compression ratio and speeds. Thought I'd mention that, maybe it's useful.

I am (probably) that person? Anyways, I was doing tests with adding zstd compression to the vanilla anvil format. I got a ratio of ~97% when compared to zlib when using zstd -4 on its own. Then with dictionary compression the ratio drops down to ~89%. It's not a huge gain due to the padding problem, but it helps a bit. To clarify, I did not test compression speed, but as I understand at these low compression levels it should be faster than zlib even without a dictionary.

That work is featured here and uses the namespace z:4d (meaning if you happened to want to standardize on my dictionary, the magic number to put before the compressed chunk data when using this format is is 0x00 0x04 0x7a 0x3a 0x34 0x64, as per Mojang's suggested format for alternative compression formats under the 127 algorithm id). I intend to release a tool for conversion of region files between different compression formats, so if Paper comes up with its own set of compression standards I will adopt them.

My ratio results are not particularly relevant to SectorFile, which will use 512 byte sectors and such has potential for a lot more gains. The dictionary is not specific to entities, poi, or chunk, but rather combined based on a representative distribution of these

lordofpipes avatar Feb 17 '24 05:02 lordofpipes

Any plans to update this and Folia's sectorfile branch to 1.20.6?

RealTriassic avatar May 14 '24 18:05 RealTriassic

I've updated the branch to 1.20.6

Spottedleaf avatar May 18 '24 01:05 Spottedleaf

I've updated SectorTool to v1.1:

Add copy option for conversion to SectorFile

The copy option does not perform decompression/recompression,
which should make it faster for systems limited by CPU resources.

Additionally, read the entire RegionFile into memory when converting
to SectorFile to avoid random read costs when converting.

I will be adding in code to convert RegionFiles on the fly as well.

Spottedleaf avatar May 20 '24 02:05 Spottedleaf

I've added conversion logic to copy data from existing RegionFiles when a SectorFile does not exist.

Spottedleaf avatar May 20 '24 15:05 Spottedleaf

When doing --forceUpgrade it crashes the server when having Unconverted RegionFiles.

GesundesWasser avatar May 23 '24 09:05 GesundesWasser

When doing --forceUpgrade it crashes the server when having Unconverted RegionFiles.

forceUpgrade is not actually supported on this branch, as the code has not been modified to use SectorFile

Spottedleaf avatar May 23 '24 15:05 Spottedleaf

Will this Format become Standard in Paper? Also would be good if the RegionFiles get removed when converted

ghost avatar May 23 '24 15:05 ghost

Also would be good if the RegionFiles get removed when converted

I don't think so, if the conversion fails and the region files are deleted, it'd be pretty bad. Just delete them manually if you need to.

RealTriassic avatar May 23 '24 15:05 RealTriassic

Any Plans to update this to Latest 1.20.6?

Steveice10 avatar May 31 '24 09:05 Steveice10

Any Plans to update this to Latest 1.20.6?

Scroll up.

Taiyou06 avatar Jun 01 '24 22:06 Taiyou06

I've updated SectorTool to 1.3 to fix an issue where SectorFiles are far larger than they should be when converting RegionFiles using the raw flag or when the RegionFiles were stored with the uncompressed format. This affects the auto conversion previously in this branch as well, as it converted using the raw flag.

If the original RegionFiles were stored in a compressed format, the large size can be fixed by converting from SectorFile to RegionFile and then back again. If the RegionFiles were in an uncompressed format, then you need to convert to RegionFile, then use the recreateRegionFiles startup option with forceUpgrade and then convert back to SectorFile.

SectorTool 1.3 also includes a minor disk space improvement by not storing type headers on disk if they're unused (this mostly will cause the type headers for poi and entity to be unallocated until data is written for them), which has been copied over to this branch.

Note that the branches are still on 1.20.6 as of writing.

EDIT: If it wasn't clear, I've copied the changes from SectorTool 1.3 to this branch to fix the conversion creating larger SectorFiles. There is no fix for already converted SectorFiles.

Spottedleaf avatar Jun 22 '24 22:06 Spottedleaf