Fix #10323: Inefficient Ender Dragon respawn sequence
This fixes issue https://github.com/PaperMC/Paper/issues/10323. Currently, the respawn sequence of the Ender Dragon has a noticeable lag spike, likely caused by the method findExitPortal() being poorly optimized, and calling the method find() (which is an expensive method, in terms of server execution time) multiple times, unnecessarily.
Solution: fix index incrementation and direct the search to the most likely spot first, instead of iterating over various blocks, calling find() without first checking whether it makes sense to call it for the instance of the block considered. Also, pre-fetching the relevant chunk does improve following computations.
Profiler links under stress original version: https://spark.lucko.me/4gqvxg7QBt?hl=9025 under stress with modification: https://spark.lucko.me/tyLvsstn9J?hl=4886
As you can see, the method findExitPortal() is nowhere to be seen, in the second profile report, indicating little impact on server performance. Also, the execution time of the method scanState() went from 5360ms to 60ms.
This PR doesn't make sense and assume that getChunk take a block pos instead of a chunk pos. It would have been good to link the video too: https://www.youtube.com/watch?v=tCcBzxnl5Os
Hello @Lulu13022002, could you please care to explain why the PR does not make sense? I would love to understand what about the PR seems confusing to you, in hopes I can make everything clear. Cheers, Afonso Mendonça Jacinto
Hey, @Owen1212055! Thanks a lot for your input. I will take another look at the code today, and profile it accordingly, just to have a solid basis on how to analyze the before and after.
One thing bothered me, though, but I might not be grasping the purpose behind it entirely. The first block of code gets the maxY and minY at the portal's supposed location, which is the origin, correct? That was something that was being done before, I believe it was the second block of code in the original version, before I altered it in my local fork. I found that version to be not really efficient, since callind find() multiple times, even when just doing it in a single column, proved to be less efficient than checking all blockentities in the chunk around the origin, and calling find() only if the instance of the blockentity being evaluated is really an instance of the End Podium block. Then, performance went up exponentially.
The second block would then try to prevent offsets bigger than that chunk, but smaller than the enveloping 4 chunks around the origin (I believe we don't want to go check the entire extension of the End). This is why I don't understand the reason to add back the first block, as I suppose it would yield the same effect, gameplay-wise, but hinder performance.
Is there a paramount check I am missing, here?
Thanks for your attention and for your insightful feedback.
After reading my previous message, I thought that perhaps it was badly phrased. To simplify, I profiled that version you suggested above before moving to the one in the pull request. That version is the obvious first improvement, yes. Changing the order of those 2 blocks of code.
And it does improve performance, but not to the extent where I felt satisfied. Calling find() on every block in that column at the origin was still very expensive.
That is why I chose to try something further beyond: to get every block in the chunk revolving the origin, so that we can check whether the block is an instance of the End Podium block. Then, and only then, if it is, we call find().
This reduced the time needed to "find" the portal immensely. I don't know why, but iterating over the blocks in the surrounding chunk was much faster than calling find() on every block in the central column.
I believe that your suggestion will achieve the same performance if there is some check of whether it makes sense to call find() for the block being considered in that position.
Then there are also doubts I have regarding the other block of code as it is right now (outside of my local fork) but we could discuss them after the first block is done being discussed.
Let me just make it clear that I am simply thriving to achieve a contribution of the best quality, and not just get my PR approved.
Thanks for your time, and I hope you don't interpret my interactions as any affront, but as fuel for further discussion and improvement in my PR. 🙏
The biggest thing is that for most people these portals should be at the normal position, that being zero zero so for most servers there is an immediate improvement. Otherwise being this aggressive for optimizations that are based on the fact of— “I don’t know why but it’s faster” — isn’t really a great maintenance factor.
I’d rather just have those diffs moved to get the performance improvement for people who have normal portals, where the vanilla behavior is used.
Ok, so we are on the same wavelength regarding the fact that most times the portal will be at the (0,0) location, or origin, at any given Y coordinate. I agree.
And then I need to ask:
- What is a reasonable "offset" that we should account for? If you tell me it is, for example, the chunk around the origin, then the second block of code (that one which iterates over i and i1) becomes redundant, and can be safely removed, because the first block of code already prefetches all the blockentities from the chunk surrounding the origin. In the version in my PR, that is.
This would significantly simplify the code. Please let me know what you think.
I think that sounds good, however it should be noted that we cant really change the offset based on what we think is reasonable. Our goal is to keep vanilla behavior the same here.
So, yes we first can check the 0,0 position, THEN we do that iteration logic. However we should keep the offset logic to what it was before, as we dont want to change this radius.
Sounds great to me. I am going to take a look at this PR today, and here's my idea of how to achieve the performance boost without changing too much:
-
First, we exchange the positions of the 2 blocks of code, as you have stated above (let us check the (0,0) column first);
-
Then, let us add a small check before the line that calls the find() method, in the first block of code. This "if" condition will ensure that we only call that method in a block that is an instance of the EndPortalBlockEntity ("find()" is the key reason for the performance decrease);
-
Finally, we could tweak the indexes in the second, more general check, as there are a lot of severely overlapping chunks being checked sequentially, which is slow. But, as you have stated, this is an occasion that might not even be common enough to warrant changes. Let me know if you'd prefer to keep it as is. (The radius would remain a 2x2 chunk radius, only changing the way it is "traversed". That is, instead of checking overlapping chunks repeatedly, each of the 4 chunks that make up the radius would be checked once. I hope I wasn't confusing in my phrasing.)
Once you approve this order of events, I will start modifying the code. Is this ok?
Your second condition does not make sense as there will not be an end portal block entity at the position of 0,0. That is where the pillar is.
I did not add that second condition to the overall code. That check already exists in the second block of code of the suggestion you provided, which means it is already part of the version currently being made available by PaperMC.
ChunkPos chunkPos = new ChunkPos(this.origin);
for (int i = -8 + chunkPos.x; i <= 8 + chunkPos.x; i++) {
for (int i1 = -8 + chunkPos.z; i1 <= 8 + chunkPos.z; i1++) {
LevelChunk chunk = this.level.getChunk(i, i1);
for (BlockEntity blockEntity : chunk.getBlockEntities().values()) {
if (blockEntity instanceof TheEndPortalBlockEntity) {
BlockPattern.BlockPatternMatch blockPatternMatch = this.exitPortalPattern.find(this.level, blockEntity.getBlockPos());
if (blockPatternMatch != null) {
BlockPos pos = blockPatternMatch.getBlock(3, 3, 3).getPos();
if (this.portalLocation == null) {
this.portalLocation = pos;
}
return blockPatternMatch;
}
}
}
}
}
return null;
}
I simply took advantage of it, and, instead of using it only in the section that iterates over i and i1, started using it in the section that checks the column at the origin. If not to identify an instance of the block of the Podium, then I do not know what that check is trying to achieve. Could you please explain why they check for an instance of TheEndPortalBlockEntity when they are trying to find the Podium?
Perhaps this check really does trigger upon finding a block that belongs to the podium.
Edit: I realized my mistake. My apologies. You were right. The second condition I suggested did not make sense for the search on the central column, as no EndPortal block entities (the dark tiles) will be found. My next message will explain the solution I have come to. Again, thanks for your patience.
After much profiling and careful analysis, I have obtained the following performance results:
Without background tasks running on the PC: https://spark.lucko.me/ItQ2aKC7vh?hl=527 -> Using the original version
With background tasks running on the PC, namely VS Code configured to index and build a Gradle project (Paper, in this case), so it is expected to see performance decreases (this was intentional, as to test the behaviour of the fix under stress):
https://spark.lucko.me/6rUTpkwc77?hl=1168 -> Using the original version https://spark.lucko.me/T8ItEo4IQz?hl=2035 -> Simply inverting the order of the code blocks (@Owen1212055 's suggestion) https://spark.lucko.me/MGgTApuUGQ?hl=1567 -> Original version with the index tweaks present in the current version of the PR (this version actually doesn't work, since the more general check is unable to find the portal and skipped to the more precise check, which means I will go back to using the indexing of the original version) https://spark.lucko.me/OSdWGEV0kf?hl=1560 -> Original version, but removing the check:
if (blockEntity instanceof TheEndPortalBlockEntity)
With all this data, and @Owen1212055 's great insight, I have come to believe the following version is the best compromise between all that has been considered thus far, since it isn't "aggressive" at all, remaining very much "vanilla-like", preserving the offset of 2x2 chunks (if the more precise check fails), and showing better performance under stress than the original version under no stress. Here it is:
@Nullable
public BlockPattern.BlockPatternMatch findExitPortal() {
BlockPos location = EndPodiumFeature.getLocation(this.origin);
int i1 = this.level.getHeightmapPos(Heightmap.Types.MOTION_BLOCKING, location).getY();
BlockPos current_position;
BlockState state;
for (int i2 = i1; i2 >= this.level.getMinY(); i2--) {
current_position = new BlockPos(location.getX(), i2, location.getZ());
state = level.getBlockState(current_position);
if (state.getBlock() == Blocks.BEDROCK) {
BlockPattern.BlockPatternMatch blockPatternMatch1 = this.exitPortalPattern.find(this.level, current_position);
if (blockPatternMatch1 != null) {
if (this.portalLocation == null) {
this.portalLocation = blockPatternMatch1.getBlock(3, 3, 3).getPos();
}
return blockPatternMatch1;
}
}
}
ChunkPos chunkPos = new ChunkPos(this.origin);
for (int i = -8 + chunkPos.x; i <= 8 + chunkPos.x; i++) {
for (i1 = -8 + chunkPos.z; i1 <= 8 + chunkPos.z; i1++) {
LevelChunk chunk = this.level.getChunk(i, i1);
for (BlockEntity blockEntity : chunk.getBlockEntities().values()) {
if (blockEntity instanceof TheEndPortalBlockEntity) {
BlockPattern.BlockPatternMatch blockPatternMatch = this.exitPortalPattern.find(this.level, blockEntity.getBlockPos());
if (blockPatternMatch != null) {
if (this.portalLocation == null) {
this.portalLocation = blockPatternMatch.getBlock(3, 3, 3).getPos();
}
return blockPatternMatch;
}
}
}
}
}
return null;
}
Result after profiling: https://spark.lucko.me/62CAl5HGvt?hl=361
Does this seem alright?
Yeah let’s see it in the PR if you don’t mind so other people could review.
@Owen1212055 Just wanted to let you know that it's done. I apologise if the notifications are too frequent — please don’t hesitate to let me know if I should ease up on them. Whenever you (or any reviewer) have the time to review, I’d appreciate any feedback on improvements. Thank you again for your time and guidance. Best regards!
We are very close! Now the last group of changes I will be doing here is going to be for "diff reduction". Basically when we make patches we want to make sure the diff is as small as possible. Here for example is what I have rewritten. Basically, we extract the bottom part of the method into a separate method and simply call it above. This allows us to keep those same changes, but avoids the large diff noise that it caused.
Feel free to copy and paste this in, squash and rebuild and see the new patch diff. (Notice it also gets rid of that name collision from moving the loop up!)
Does that all make sense?
@Nullable
public BlockPattern.BlockPatternMatch findExitPortal() {
ChunkPos chunkPos = new ChunkPos(this.origin);
// Paper start
BlockPattern.BlockPatternMatch originSearch = findExitPortalExitAroundOrigin();
if (originSearch != null) {
return originSearch;
}
// Paper end
for (int i = -8 + chunkPos.x; i <= 8 + chunkPos.x; i++) {
for (int i1 = -8 + chunkPos.z; i1 <= 8 + chunkPos.z; i1++) {
LevelChunk chunk = this.level.getChunk(i, i1);
for (BlockEntity blockEntity : chunk.getBlockEntities().values()) {
if (blockEntity instanceof TheEndPortalBlockEntity) {
BlockPattern.BlockPatternMatch blockPatternMatch = this.exitPortalPattern.find(this.level, blockEntity.getBlockPos());
if (blockPatternMatch != null) {
// BlockPos pos = blockPatternMatch.getBlock(3, 3, 3).getPos(); // Paper - move down
if (this.portalLocation == null) {
this.portalLocation = blockPatternMatch.getBlock(3, 3, 3).getPos(); // Paper
}
return blockPatternMatch;
}
}
}
}
}
// Paper start - exit portal optimization look around the origin first
return null;
}
public BlockPattern.@org.checkerframework.checker.nullness.qual.Nullable BlockPatternMatch findExitPortalExitAroundOrigin() {
BlockPos targetPos;
// Paper end - exit portal optimization look around the origin first
BlockPos location = EndPodiumFeature.getLocation(this.origin);
int i1 = this.level.getHeightmapPos(Heightmap.Types.MOTION_BLOCKING, location).getY();
for (int i2 = i1; i2 >= this.level.getMinY(); i2--) {
// Paper start
targetPos = new BlockPos(location.getX(), i2, location.getZ());
// Early exit if we encroached a non bedrock block because the middle column MUST be bedrock
if (!this.level.getBlockState(targetPos).is(net.minecraft.world.level.block.Blocks.BEDROCK)) {
continue;
}
BlockPattern.BlockPatternMatch blockPatternMatch1 = this.exitPortalPattern.find(this.level, targetPos);
// Paper end
if (blockPatternMatch1 != null) {
if (this.portalLocation == null) {
this.portalLocation = blockPatternMatch1.getBlock(3, 3, 3).getPos();
}
return blockPatternMatch1;
}
}
return null;
}
We are very close! Now the last group of changes I will be doing here is going to be for "diff reduction". Basically when we make patches we want to make sure the diff is as small as possible. Here for example is what I have rewritten. Basically, we extract the bottom part of the method into a separate method and simply call it above. This allows us to keep those same changes, but avoids the large diff noise that it caused.
Feel free to copy and paste this in, squash and rebuild and see the new patch diff. (Notice it also gets rid of that name collision from moving the loop up!)
Does that all make sense?
@Nullable public BlockPattern.BlockPatternMatch findExitPortal() { ChunkPos chunkPos = new ChunkPos(this.origin); // Paper start BlockPattern.BlockPatternMatch originSearch = findExitPortalExitAroundOrigin(); if (originSearch != null) { return originSearch; } // Paper end for (int i = -8 + chunkPos.x; i <= 8 + chunkPos.x; i++) { for (int i1 = -8 + chunkPos.z; i1 <= 8 + chunkPos.z; i1++) { LevelChunk chunk = this.level.getChunk(i, i1); for (BlockEntity blockEntity : chunk.getBlockEntities().values()) { if (blockEntity instanceof TheEndPortalBlockEntity) { BlockPattern.BlockPatternMatch blockPatternMatch = this.exitPortalPattern.find(this.level, blockEntity.getBlockPos()); if (blockPatternMatch != null) { // BlockPos pos = blockPatternMatch.getBlock(3, 3, 3).getPos(); // Paper - move down if (this.portalLocation == null) { this.portalLocation = blockPatternMatch.getBlock(3, 3, 3).getPos(); // Paper } return blockPatternMatch; } } } } } // Paper start - exit portal optimization look around the origin first return null; } public BlockPattern.@org.checkerframework.checker.nullness.qual.Nullable BlockPatternMatch findExitPortalExitAroundOrigin() { BlockPos targetPos; // Paper end - exit portal optimization look around the origin first BlockPos location = EndPodiumFeature.getLocation(this.origin); int i1 = this.level.getHeightmapPos(Heightmap.Types.MOTION_BLOCKING, location).getY(); for (int i2 = i1; i2 >= this.level.getMinY(); i2--) { // Paper start targetPos = new BlockPos(location.getX(), i2, location.getZ()); // Early exit if we encroached a non bedrock block because the middle column MUST be bedrock if (!this.level.getBlockState(targetPos).is(net.minecraft.world.level.block.Blocks.BEDROCK)) { continue; } BlockPattern.BlockPatternMatch blockPatternMatch1 = this.exitPortalPattern.find(this.level, targetPos); // Paper end if (blockPatternMatch1 != null) { if (this.portalLocation == null) { this.portalLocation = blockPatternMatch1.getBlock(3, 3, 3).getPos(); } return blockPatternMatch1; } } return null; }
It makes a lot of sense to me! I will copy and paste and squash comitts, then force-push again.
@Owen1212055 Done. All seems alright. Awaiting further actions.
There seems to be alot of extra line changes, try to make sure ur copying my original comment.
There seems to be alot of extra line changes, try to make sure ur copying my original comment.
I did not alter anything. I copied it as it is.
Notice the extra spaces, if we could get those removed would be awesome.
This is weird. When I replied to your original comment, it suddenly changed formatting. My apologies. I will fix it right now.
No need to apologize! :)
Hey, @Owen1212055. Again, sorry for pinging you. Just wanted to know whether there is anything else I should do at the moment, or should I wait, since the checks passed and the extra spaces should be gone, now.
Ur all good! It's all good on my end, just wait for some other people to chime in. 🙂
Ok! Thanks for your guidance! Much appreciated. And good luck with all the rest👍🏼
Good afternoon. Because it's been around 3 weeks since any developments occurred, I'm just popping up to ask whether anything's wrong with this PR. Since the pipeline didn't run again, I admit I became slightly concerned. Can I do something differently?
Stay safe.
My final comment for this PR is that this should be made configurable, otherwise, LGTM. As this will most likely break niche setups