xrdp
xrdp copied to clipboard
[H.264] make threads configurable
At the moment, the number of threads is hardcoded to 1. I should have made this configurable via gfx.toml. In a use case of an enterprise user, the server host lacks GPU but much more CPU cores are available than an ordinary desktop. This is very informative to me.
In most cases, 1-core is enough for the encoding but it's a good thing to make it configurable for flexibility.
https://github.com/neutrinolabs/xrdp/blob/2a17023a46f06599cc959ecd1150cc3be9bb4543/xrdp/xrdp_encoder_x264.c#L153-L160
I'm not sure if the number of OpenH264's threads is configurable but I would like to make it configurable as well if possible.
LGTM.
I've done some poking around in OpenH264, and there's a parameter iMultipleThreadIdc in SEncParamExt which allows the threads in the thread pool to be specified. It's not clear to me however whether there would be any benefit in overriding the automatic setting of 0.
The docs for OpenH264 MultipleThreadIdc aren't great; the best I can find is this inline comment:
...
} else if (strTag[0].compare ("MultipleThreadIdc") == 0) {
// # 0: auto(dynamic imp. internal encoder); 1: multiple threads imp. disabled; > 1: count number of threads;
pSvcParam.iMultipleThreadIdc = atoi (strTag[1].c_str());
...
The setting is referenced in some debug output but looks like that's preprocessed away, depending how the library is built.
I agree that 0 sounds best for the general case, but what is the default? (and is it 1?)
I applied the following small patch and confirmed the default is 1. However, I'm still unsure how much increasing the value contributes to the encoding. I can't see any obvious differences between 1 and 4 in CPU usage of xrdp process.
diff --git a/xrdp/xrdp_encoder_openh264.c b/xrdp/xrdp_encoder_openh264.c
index 4bae6b97..75313f7e 100644
--- a/xrdp/xrdp_encoder_openh264.c
+++ b/xrdp/xrdp_encoder_openh264.c
@@ -209,6 +209,8 @@ xrdp_encoder_openh264_encode(void *handle, int session, int left, int top,
oe->openh264_enc_han, &encParamExt);
LOG(LOG_LEVEL_INFO, "xrdp_encoder_openh264_encode: "
"InitializeExt rv %d", status);
+ LOG(LOG_LEVEL_INFO, "xrdp_encoder_openh264_encode: "
+ "threads=%d", encParamExt.iMultipleThreadIdc);
}
oe->yuvdata = g_new(char, (width + 16) * (height + 16) * 2);
if (oe->yuvdata == NULL)
[2024-12-30T22:46:35.545+0900] [INFO ] xrdp_encoder_openh264_encode: GetDefaultParams rv 0
[2024-12-30T22:46:35.548+0900] [INFO ] xrdp_encoder_openh264_encode: InitializeExt rv 0
[2024-12-30T22:46:35.549+0900] [INFO ] xrdp_encoder_openh264_encode: threads=1
The bottleneck would be at the point that threads=1 drives a single core to 100%, even though multiple cores are available. This may be difficult to reproduce without stress-test scenarios, high fps, resolution, and busy screen updates.
There's a risk of being the outlier. Should xrdp default to threads=0 when most OpenH264 users aren't exercising that code path? Will xrdp receive difficult bug reports due to the library's lack of wide MT testing?? Hard to say.
I would appreciate the flexibility to change the setting via xrdp config file, but can't justify an xrdp-specific default which differs to the upstream library.
It makes sense that OpenH264 consumes less CPU given its background. OpenH264 was developed by Cisco for web conferencing, so it is more geared towards real-time encoding use cases. OpenH264 achieves real-time performance at the cost of less configuration flexibility, such as supporting only baseline profile, up to level 5.2, no configuration like x264's presets. So I guess that's the reason why there's no significant difference between threads=1 vs threads=4, it didn't hit the ceiling of 1-thread as you say.
I personally stick to threads=1 by default, as well as with x264, when making it configurable.
Keep this open for discussion about OpenH264 threads.
It seems OpenH264 only supports slice-based multi-threading. Probably it requires more than 1 slices to encode with 2 threads? https://github.com/cisco/openh264/issues/2651