FastAsyncWorldEdit icon indicating copy to clipboard operation
FastAsyncWorldEdit copied to clipboard

Introduce basic support for Vector API

Open SirYwell opened this issue 1 year ago • 3 comments

Overview

Description

This is an experimental, internal feature to use the Vector API when enabled. The current implementation is designed to cover basic use cases (i.e. //gmask with single blocks, //set with single blocks, //replace with single blocks). This could be expanded, but as that needs to be done rather careful to not make performance worse, I left it to the very basics for now.

A quick example of running //replace dirt diamond_block on a selection of 2048x320x2048:

Before: before

after

Note that the hasSection calls take up the same time in both profiles, same for initLayer. The difference is in the filter call, which results in many additional calls in the none-specialized variant vs pretty much only memory access and vector instructions in the specialized variant. (There might be ways to optimize without the Vector API here I guess, but the existing abstraction doesn't work well in either cases)

A few more notes on the benchmark:

  • I limited FAWE to use 4 threads only, because that's what many people actually have (and FAWE is just comically fast otherwise)
  • It might make sense to set the number of threads to the half of the actual hardware threads, as some platforms are known to not perform well with vector instructions otherwise - I don't know how much that might skew the result
  • I'm running a machine with AVX2 (256 bit vectors) only, machines with AVX3/AVX512 might perform even better
  • The difference is not really relevant on small edits
### Submitter Checklist
- [x] Make sure you are opening from a topic branch (**/feature/fix/docs/ branch** (right side)) and not your main branch.
- [x] Ensure that the pull request title represents the desired changelog entry.
- [x] New public fields and methods are annotated with `@since TODO`.
- [x] I read and followed the [contribution guidelines](https://github.com/IntellectualSites/.github/blob/main/CONTRIBUTING.md).

SirYwell avatar Aug 21 '24 14:08 SirYwell

Short rather than int?

dordsor21 avatar Aug 21 '24 16:08 dordsor21

In its current state, it would be difficult to use IntVector, as we can only use ShortVector to load from char[]. It might make sense to design with other element types in mind. I think #2181 would probably make things easier there too. Alternatively using the Foreign Memory API would be less restrictive, but that's only a final feature since Java 22.

SirYwell avatar Aug 21 '24 16:08 SirYwell

(Also vector operations are probably one of the few places where element size actually has an impact on performance, so keeping it as small as possible makes sense)

SirYwell avatar Aug 21 '24 16:08 SirYwell

In its current state, it would be difficult to use IntVector, as we can only use ShortVector to load from char[]. It might make sense to design with other element types in mind. I think #2181 would probably make things easier there too. Alternatively using the Foreign Memory API would be less restrictive, but that's only a final feature since Java 22.

Yeah I suppose a lot of this would be best duplicated as the data array PR does so we can maintain the smaller size "optimisation" where possible

Also looks like some changes are needed to the CI/PR run config

dordsor21 avatar Sep 01 '24 10:09 dordsor21

Yeah I suppose a lot of this would be best duplicated as the data array PR does so we can maintain the smaller size "optimisation" where possible

Yes, but I intend to leave it like this for now, as it is experimental anyway.

Also looks like some changes are needed to the CI/PR run config

That should be fixed now, the javadoc task was failing as it requires the --add-modules flag too.

SirYwell avatar Sep 01 '24 14:09 SirYwell

Please take a moment and address the merge conflicts of your pull request. Thanks!

github-actions[bot] avatar Sep 14 '24 08:09 github-actions[bot]