re4_tweaks icon indicating copy to clipboard operation
re4_tweaks copied to clipboard

`D3DCREATE_MULTITHREADED` flag removal

Open emoose opened this issue 2 years ago • 14 comments

E: here's a build if anyone wants to try this out, should give some performance improvement, might help if you struggle to reach 60 - game might crash on launch sometimes (it's usually been fine for me lately though), but if you manage to load in it should hopefully be stable: re4_tweaks1.7.4-RemoveD3DCREATE_MULTITHREADED.zip


Like mentioned at https://github.com/nipkownix/re4_tweaks/issues/5#issuecomment-954563654, the game uses the D3DCREATE_MULTITHREADED flag when creating the D3D9 device, which adds extra thread safety to the D3D funcs at a cost of performance.

Removing games frame cap showed removing this flag allowed going from 220FPS to 270FPS in one area, would probably be good for people that struggle running the game at 60 and end up with game slowdown due to it.

Unfortunately the game seems pretty unstable with this flag removed, obviously they didn't bother making it thread safe on Windows - however, it seems the game does have a pair of nullsubs around certain graphics-threading related things, kinda seems like they were meant to be a pair of funcs for locking/unlocking a mutex, but that's just a guess.

(On X360 the D3DCREATE_MULTITHREADED flag is apparently non-functional, not sure if that means X360 always had thread-safety stuff added, or maybe X360 devs had to be more careful with threading, which would explain the nullsub pair - but doesn't explain why they removed the code inside ;_;)


So I did another one of my experiments, hooked the two nullsubs to call lock()/unlock() on a std::recursive_mutex - sadly this resulted in a crash on boot, seems the UnlockMutex hook was being called before LockMutex for some reason, maybe the code for locking it got removed or something, just got around that by adding a hack to skip first call to it. (E: no longer needed)

Then the game just started hanging before intro movie, one thread was waiting on the mutex lock to get unlocked by another first, some reason there's a few LockMutex calls in the game without accompanying UnlockMutex after it - likely an accident, since the func for locking mutex also returns the D3D pointer, they probably added some Win32 specific code and needed a way to get D3D device, and didn't bother adding UnlockMutex afterward since Win32 didn't actually need it. (seems this missing UnlockMutex only happens around calls to D3DXCreateTextureFromFileInMemoryEx - guessing that func was probably added for Windows)

Hooking the calls to that D3DX func & making it use UnlockMutex afterwards seems to let it continue though, managed to load into a game fine with it, + haven't had any crashes yet (besides a crash on exit, probably not too hard to fix though)

(not sure how proper that fix is though, game seems to be doing something with a ptr returned from that D3DX func, so maybe it should only be unlocking after it's done with the ptr, instead of right after calling D3DX, not sure...)

Need to do more testing with it (only tried like 5 minutes in game so far), might need to make Imgui make use of those mutex funcs too if that also uses the D3D device. (also need to check performance with it now, could be adding this mutex stuff slows it down the same as the thread flag)

Code:

#include <mutex>
[...]

std::recursive_mutex g_D3DMutex;

void __cdecl D3D_LockMutex_Hook() // hooks 0x9391C0
{
	g_D3DMutex.lock();
}

void __cdecl D3D_UnlockMutex_Hook() // hooks 0x9391D0
{
	g_D3DMutex.unlock();
}

int(__stdcall* D3DXCreateTextureFromFileInMemoryEx_Orig)(
	int a1,
	int a2,
	int a3,
	int a4,
	int a5,
	int a6,
	int a7,
	int a8,
	int a9,
	int a10,
	int a11,
	int a12,
	int a13,
	int a14,
	int a15);

int __stdcall D3DXCreateTextureFromFileInMemoryEx_Hook(
	int a1,
	int a2,
	int a3,
	int a4,
	int a5,
	int a6,
	int a7,
	int a8,
	int a9,
	int a10,
	int a11,
	int a12,
	int a13,
	int a14,
	int a15)
{
	auto ret = D3DXCreateTextureFromFileInMemoryEx_Orig(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15);
	D3D_UnlockMutex_Hook();
	return ret;
}

void ThreadFix_Hook()
{
	const int D3DCREATE_MULTITHREADED = 4;
	// Clear D3DCREATE_MULTITHREADED flag from D3D CreateDevice call
	auto pattern = hook::pattern("68 ? ? ? ? 68 ? ? ? ? 6A 44 56 8B 35");
	auto ptr_CreateDevice_BehaviorFlags = pattern.count(1).get(0).get<uint8_t>(0xB);
	Patch(ptr_CreateDevice_BehaviorFlags, uint8_t(*ptr_CreateDevice_BehaviorFlags & ~D3DCREATE_MULTITHREADED));

	// Game has a pair of nullsubs that are always called just before graphics-threading related code is used
	// Kinda seems like they were meant to be a pair of funcs for locking/unlocking a mutex, but that's just a guess.
	// Restore these so that flag removal above can be made more stable
	pattern = hook::pattern("E8 ? ? ? ? A1 ? ? ? ? A3 ? ? ? ? 89 1D ? ? ? ? A3 ? ? ? ? E8 ? ? ? ? 8B 35");
	auto ptr_D3D_LockMutex = injector::GetBranchDestination(pattern.count(1).get(0).get<uint32_t>(0)).as_int();
	InjectHook(ptr_D3D_LockMutex, D3D_LockMutex_Hook);

	pattern = hook::pattern("E8 ? ? ? ? 68 ? ? ? ? FF 15 ? ? ? ? A1 ? ? ? ? 50 FF 15");
	auto ptr_D3D_UnlockMutex = injector::GetBranchDestination(pattern.count(1).get(0).get<uint32_t>(0)).as_int();
	InjectHook(ptr_D3D_UnlockMutex, D3D_UnlockMutex_Hook);

	// Game calls D3D_LockDevice before D3DXCreateTextureFromFileInMemoryEx
	// LockDevice returns pointer to D3D device, so they probably used that as a quick way to retrieve it, but forgot/didn't care about using UnlockDevice afterward
	// Hook the misbehaving calls so we can add UnlockMutex calls to them
	// (not sure if this is safest way to do it though - game seems to do something with the ptr returned by D3DXCreate...
	// maybe UnlockMutex should be after it's finished with that ptr, would be harder to patch in tho...)
	pattern = hook::pattern("53 E8 ? ? ? ? 50 E8 ? ? ? ? 8B 36 8B 0E 8D 55 ?");
	auto ptr_caller1 = pattern.count(1).get(0).get<uint32_t>(7); // 0x98009D
	ReadCall(ptr_caller1, D3DXCreateTextureFromFileInMemoryEx_Orig);
	InjectHook(ptr_caller1, D3DXCreateTextureFromFileInMemoryEx_Hook);

	pattern = hook::pattern("57 E8 ? ? ? ? 50 E8 ? ? ? ? 8B 06 8B 10 8B 52 ? 8D 4D ?"); // 0x980234 & 0x981049
	InjectHook(pattern.count(2).get(0).get<uint32_t>(7), D3DXCreateTextureFromFileInMemoryEx_Hook);
	InjectHook(pattern.count(2).get(1).get<uint32_t>(7), D3DXCreateTextureFromFileInMemoryEx_Hook);

	pattern = hook::pattern("53 E8 ? ? ? ? 50 E8 ? ? ? ? 8B 36 8B 0E 8D 55 ? 52"); // 0x9E4B82, unused?
	InjectHook(pattern.count(1).get(0).get<uint32_t>(7), D3DXCreateTextureFromFileInMemoryEx_Hook);

	pattern = hook::pattern("57 E8 ? ? ? ? 50 E8 ? ? ? ? 8D 4D ? 8B F8 8B 06 8B 10 8B 52 ?"); // 0x9E6619
	InjectHook(pattern.count(1).get(0).get<uint32_t>(7), D3DXCreateTextureFromFileInMemoryEx_Hook);

	// Lone UnlockMutex call at 0x955792 - doesn't have a LockMutex call before it for some reason
	// this would cause game crash on startup, and skipping it caused game crash on exit - nopping it instead seems to fix both
	pattern = hook::pattern("89 0D ? ? ? ? A3 ? ? ? ? 89 99 ? ? ? ? 89 99 ? ? ? ? E8 ? ? ? ?");
	Nop(pattern.count(1).get(0).get<uint8_t>(0x17), 5);
}

emoose avatar Jan 24 '22 04:01 emoose