game-compatibility icon indicating copy to clipboard operation
game-compatibility copied to clipboard

4B4F07FE - Fist of the North Star: Ken's Rage

Open cesys opened this issue 7 years ago • 55 comments

Marketplace

Tested on (https://github.com/benvanik/xenia/commit/92a4b90ed0fceeb9240a1436d3ffe1d9164c358a)

Issues:

Intro and Menus are ok. On Vulkan, the game crashes as soon as the 3D kicks in because of some missing shader translations. On OpenGL, the game goes into the 3D playable part, but there are a lot of missing shader translations. Same graphic issue as Dragon Ball Z Ultimate Tenkaichi and Dragon Ball: Raging Blast 2. Reds and Blues are swapped when using Vulkan during the OPENING.wmv video. Due to missing connection with the blitter that swap the Red and Blue channels through the shader.

Log:

OpenGL Vulkan

Screenshot(s):

OpenGL Video taken from YouTube. Didn't post it personally. Vulkan vulkan

Labels:

state-menus gpu-corrupt-drawing (on Vulkan) gpu-shader-errors

cesys avatar Mar 21 '17 09:03 cesys

Debugging Vulkan case (OpenGL is negligible since the support will be probably dropped). Instrumented command_processor.cc#bool CommandProcessor::ExecutePacketType0(RingBuffer* reader, uint32_t packet) with these logs:

!> 00000004 XXX: packet -> FFFF -> 42100
!> 00000004 XXX: count -> FFFF -> 5
!> 00000004 XXX: reader->read_count() -> FFFF -> 83D4
!> 00000004 XXX: count * sizeof(uint32_t) -> FFFF -> 14
!> 00000004 XXX: base_index -> FFFF -> 2100
!> 00000004 XXX: write_one_reg -> FFFF -> 0
!> 00000004 XXX: reg_data -> FFFF -> FFFFFF
!> 00000004 XXX: target_index -> FFFF -> 2100
!> 00000004 XXX: m -> FFFF -> 0
!> 00000004 XXX: reg_data -> FFFF -> 0
!> 00000004 XXX: target_index -> FFFF -> 2101
!> 00000004 XXX: m -> FFFF -> 1
!> 00000004 XXX: reg_data -> FFFF -> 0
!> 00000004 XXX: target_index -> FFFF -> 2102
!> 00000004 XXX: m -> FFFF -> 2
!> 00000004 XXX: reg_data -> FFFF -> FFFFFF
!> 00000004 XXX: target_index -> FFFF -> 2103
!> 00000004 XXX: m -> FFFFFF -> 3
!> 00000004 XXX: reg_data -> FFFFFF -> F
!> 00000004 XXX: target_index -> FFFFFF -> 2104
!> 00000004 XXX: m -> FFFFFF -> 4

Seems like we're overwriting the register at 0x2103 (XE_GPU_REG_VGT_MULTI_PRIM_IB_RESET_INDX) with 0xFFFFFF. This is triggering the assertion in pipeline_cache.cc: assert_true(regs.multi_prim_ib_reset_index == 0xFFFF || regs.multi_prim_ib_reset_index == 0xFFFFFFFF);

Digging deeper. If somebody has a clue on why this is happening, please let me know.

cesys avatar Mar 22 '17 07:03 cesys

Fixed issue for Vulkan. We should allow 0xFFFFFF as it's a valid index. Vulkan doesn't allow directly to specify the index, it just allows to enable primitive reset determining automatically the index depending on the type of VkIndexType. 0xFFFFFF is allowed by OpenGL as index, that's were the inconsistency happens. After bypassing the assert, I'm hitting the same bug I hit for OpenGL (in OpenGL I don't hit the assert because glPrimitiveRestartIndex allows 0xFFFFFF). The next bug seems to be and issue with type 0 packets. For some reasons, the ring buffer get corrupted and instead of getting a type 3 packet I get a weird type 0 packet which doesn't parse. Investigating why this is happening.

cesys avatar Mar 24 '17 03:03 cesys

We have that assert installed in pipeline_cache.cc because we haven't yet setup primitive reset in the Vulkan graphics system.

As for CP corruption - that sounds interesting, please do keep us posted.

DrChat avatar Mar 24 '17 20:03 DrChat

What's I'm saying is that primitive reset is actually there. Vulkan doesn't allow to specify the index because it doesn't need it. It just allows to set the enable/disable flag, which is already properly set in the current implementation of pipeline_cache.cc. Based on the value of VkIndexType Vulkan automatically sets the right value of the index which is either 0xFFFF in case of 16 bit or 0xFFFFFFFF in case of 32 bits. However, sometimes we get some PM4 packets that try to set the index to 0xFFFFFF which is allowed as an index in OpenGL (that's why OpenGL version doesn't hit this first bug (there's no assert there anyway)). My point is that we should add the condition: || regs.multi_prim_ib_reset_index == 0xFFFFFF to the assert and rely on: if (regs.pa_su_sc_mode_cntl & (1 << 21)) { state_info.primitiveRestartEnable = VK_TRUE; } else { state_info.primitiveRestartEnable = VK_FALSE; } for the primitive reset, which should be enough providing VkIndexType is set accordingly (not 100% sure about this last bit). In the meanwhile, I'm working on the CP corruption. As soon as I have a patch (or a full root cause analysis) for both issues I'm gonna share it for review.

cesys avatar Mar 25 '17 01:03 cesys

With primitive restart enabled, Vulkan treats a special index value (2^(n bits index) - 1) as a primitive restart rather than an ordinary index. We don't insert any indices that will trigger this functionality when submitting a drawcall, so primitive restart does not work at the moment.

DrChat avatar Mar 25 '17 02:03 DrChat

As far as I know that the index is VkIndexType (the one I was talking about earlier). We do the following in vulkan_command_processor.cc VkIndexType index_type = info.format == IndexFormat::kInt32 ? VK_INDEX_TYPE_UINT32 : VK_INDEX_TYPE_UINT16; Not sure that's enough though. I'm not comfortable with the code yet. Of course, I trust your judgement on this. BTW, I'm not able to post any message on xenia-dev. Do you know why?

cesys avatar Mar 25 '17 02:03 cesys

In WorkerThreadMain, at the line: uint32_t write_ptr_index = write_ptr_index_.load(); the write_ptr_index ends up being broken and that broken value is passed to ExecutePrimaryBuffer. I logged the values of write_ptr_index: ... write_index : 00000968 write_index : 0000096B write_index : 0000097D write_index : 00000016 <- corrupted value!

After that corruption I get this official log: !> 00000004 ExecutePacketType0 overflow (read count 00005A60, packet count 00007F10) and consequently: !> 00000004 **** PRIMARY RINGBUFFER: Failed to execute packet.

Problem is I don't have any idea on how to understand why write_index get corrupted. After some debugging I realized that value is populated by a a callback registered at: // Let the processor know we want register access callbacks. memory_->AddVirtualMappedRange( 0x7FC80000, 0xFFFF0000, 0x0000FFFF, this, reinterpret_castcpu::MMIOReadCallback(ReadRegisterThunk), reinterpret_castcpu::MMIOWriteCallback(WriteRegisterThunk));

but I'm missing the mechanism that triggers the callback.

cesys avatar Mar 25 '17 05:03 cesys

write_ptr_index_ represents the GPU register that holds the write pointer. The game directly writes the register, which triggers an exception that eventually makes its way to GraphicsSystem::WriteRegister.

Although unlikely, are you sure the write pointer didn't just loop around because the ringbuffer is 0x1000 bytes?

DrChat avatar Mar 25 '17 14:03 DrChat

Pretty sure it's not the case. Here you have another sequence: !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BC3 , write_index : 00000BC9 !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BC9 , write_index : 00000BD5 !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BD5 , write_index : 00000BDB !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BDB , write_index : 00000BE1 !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BE1 , write_index : 00000BF3 !> 00000004 XXX: ExecutePrimaryBuffer -> read_index : 00000BF3 , write_index : 00000016 hardly doubt you're gonna hit a big chunk after 00000BF3 that will force the ring buffer to wrap around. BTW, it always ends up being 16. Do you know which OPCODES are involved? After my analysis, it seems that I should start from OPCODE_LOAD in constant_propagation_pass.cc, but I'm not sure that's the right track.

cesys avatar Mar 25 '17 20:03 cesys

Don't worry about the JIT opcodes, the ones dealing with integer arithmetic and similar have been verified.

I've had a similar issue before where the ring buffer wasn't the correct size and it wrapped around incorrectly. That might be worth looking into, though it may be unlikely in this case.

DrChat avatar Mar 25 '17 21:03 DrChat

Seems like in: auto graphics_system = kernel_state()->emulator()->graphics_system(); graphics_system->InitializeRingBuffer(ptr, log2_size); the ring buffer is initialized to be 0x8000 bytes meaning 32k so we're definitely far from wrapping around... Unless the dimension value get corrupted.

cesys avatar Mar 25 '17 21:03 cesys

Found the issue. In this specific title the ring buffer got reinitialized at least twice with: InitializeRingBuffer Immediately after the second initialization, the game crashes because read_ptr_index_ is not reset to 0. Adding: void CommandProcessor::InitializeRingBuffer(uint32_t ptr, uint32_t log2_size) { + read_ptr_index_ = 0; primary_buffer_ptr_ = ptr; primary_buffer_size_ = uint32_t(std::pow(2u, log2_size)); } makes the emulator work for some more cycles, until I hit another issue: Expression: furthest_usage->use->instr->block==block Going ahead with the analysis.

cesys avatar Mar 25 '17 22:03 cesys

Commenting the assert makes the game go a little further (it plays another short WMV). After that I get a shader translation exception. Which means we're approaching the 3D part. With OpenGL the emulator actually goes into the game. Vertex shader doesn't seem to do much but the pixel shader is okkey (2D images are there).

cesys avatar Mar 26 '17 01:03 cesys

Submit a PR with that read_ptr_index_ fix and I'll pull it

As for that assert - that looks rather serious as I haven't encountered that before.

DrChat avatar Mar 26 '17 01:03 DrChat

Ok, will do.

cesys avatar Mar 26 '17 01:03 cesys

PR at: https://github.com/benvanik/xenia/pull/684

cesys avatar Mar 26 '17 04:03 cesys

Investigating the next assertion (not sure if it's correct to keep doing that in the context of this issue). If I bypass that assert, the next issue I get is related to the missing implementation of: kGetTextureGradients It would be good to get some hints on how to cope with shader translation (unless somebody is already developing that specific translation).

cesys avatar Mar 26 '17 21:03 cesys

In my opinion, you're doing the right thing. To focus on one major title and to solve the issues step by step is a efficient way to make a emulator work. really good job, cesys.

jackchentwkh avatar Mar 26 '17 22:03 jackchentwkh

Thanks you, jackchentwkh. Definitely agree with you on that. The motivation of making a title you like work is strong. However, I'm facing two major issues on which you may be able to cast some light:

  • it's very hard to get info/help on specific topics (like shader translation for instance)
  • it's not clear to me out to understand who's doing what (in order to avoid overlapping)

cesys avatar Mar 26 '17 23:03 cesys

well, I am not in the xenia-dev group, so your questions are my questions, too. But you did a better job then I did. I am still trying to understand the code of xenia.

jackchentwkh avatar Mar 27 '17 00:03 jackchentwkh

Since we have: assert_true(furthest_usage->value->def->block == block); assert_true(furthest_usage->use->instr->block == block); that means that every time we spill a register we should have the following condition:

furthest_usage->value->def->block == furthest_usage->use->instr->block == block (1)

I added this check (1) in DumpUsage and I noticed that the condition (1) is violated multiple times during the register_alloc_pass workflows. However, the assertion happens only when, after the violation, the TryAllocateRegister fails and then the Run procedure tries to call SpillOneRegister (this sequence occurs in this title). In the regular workflows, instead of calling SpillOneRegister the Run procedure calls PrepareBlockState (it restarts) and the violation disappear as everything is cleaned up.

In bool RegisterAllocationPass::Run(HIRBuilder* builder) if you retrieve the list of instr starting from the builder and you perform the check (1) on each instruction of the list you may hit the violation. Therefore, the violation is already present in the list of instr associated to the blocks of the builder.

Not sure if the violation is supposed to happen sometimes or it should never happen. In the first case the assertion are not justified. Also, commenting the assertions, everything goes ahead smoothly till the 3D part of the game (which lacks of shader translation and crashes after a while, but for internal reason -> in game crash with the black/green window).

Does somebody know what's going on?

cesys avatar Mar 27 '17 07:03 cesys

Found the issue. Altivec instruction InstrEmit_stvrx seems to have an issue when calling f.BranchFalse(eb, skip_label);. This will force and EndBlock which will set current_block_ = NULL; When the altivec instruction goes ahead calling: ea = f.And(ea, a); the call: Instr* i = AppendInstr(OPCODE_AND_info, 0, AllocValue(value1->type)); will assess that current_block_ is NULL and will call AppendBlock(); creating a new block that will create the inconsistency between the correct "value->def->block" and the wrong "use->instr->block" as the call: i->set_src1(value1); will set this new block for the variable "use". Hard to build a fix, until I really go deep to the root cause. This code doesn't seem to be straightforward.

cesys avatar Mar 30 '17 06:03 cesys

Okay, I can see what you mean here: Value ea is calculated at the beginning of the instruction, and we create a branch if (ea & 0xF == 0).

Problem is ea is used in a different block than where it was created.

Very detailed analysis, awesome work!

DrChat avatar Mar 30 '17 14:03 DrChat

Thanks, DrChat. Do you have a fix in mind? I tried to remove the EndBlock from the BranchFalse (only for that call) but the system crashes. BTW, that's the only case were we call BranchFalse, apart from InstrEmit_branch (were it's supposed to be called) so I guess the BranchFalse pattern is not yet suited for this case.

cesys avatar Mar 31 '17 02:03 cesys

PR for the fix on Altivec instructions at https://github.com/benvanik/xenia/pull/687

cesys avatar Mar 31 '17 06:03 cesys

Working on the Vulkan shader issue related to reds and blues swapping (the issue is present in multiple titles, as I could read in the different compatibility issue descriptions).

cesys avatar Mar 31 '17 18:03 cesys

There's the colour swap, as well as the "half screen" issue a lot of games suffer from. Not sure if they're related. Out of curiosity, are you working on an AMD or nVidia card?

Edit: This fix fixes the Sonic 06 demo crash on Vulkan now. Only thing working still is just the HUD elements.

Irixion avatar Apr 01 '17 15:04 Irixion

I'm with an nVidia GTX 980. With this title and with my system, I only get the color swap issue. I tried with Minecraft Story Mode and in that case I don't get any of these issues (except that the game freezes when entering the 3D part). Happy to know this fix is useful for other titles, though.

cesys avatar Apr 01 '17 18:04 cesys

@cesys If you're working on the color swap issue, I can fill you in on some details:

The Xenos renders to the framebuffer in a BGR format, whereas Vulkan renders RGB. So, when the game resolves the framebuffer to a texture, the colors are stored in the opposite format. We can fix this by manually blitting any resolves and swapping the color channels during that blit.

Basically, I need a blitter setup here for Vulkan. If you want to work on that, it's all yours!

DrChat avatar Apr 01 '17 18:04 DrChat

I suppose the colour issues on AMD cards is unrelated. Though from the AMD Vulkan issue, I can guess that most people are with team green and not team red.

Irixion avatar Apr 01 '17 19:04 Irixion