toxav.c causing low FPS due to bad memory allocation
Android tox client (Antox) is suffering from low FPS. Investigating this issue, i came to the native implementation of tox, which could faster up much the FPS.
toxav_video_send_frame method in toxav.c has quite heavy and unnecessary memory allocation for each frame to encode.
vpx_img_alloc(&img, VPX_IMG_FMT_I420, width, height, 0);
- this one could be allocated only once, when the video capture is started.
memcpy(img.planes[VPX_PLANE_Y], y, width * height);
memcpy(img.planes[VPX_PLANE_U], u, (width / 2) * (height / 2));
memcpy(img.planes[VPX_PLANE_V], v, (width / 2) * (height / 2));
- these could use the links to y,u,v arrays, instead of copying the data each frame. Or am i wrong, and vpx codec needs the y,u,v data to be side-by-side organized in memory, and we cannot avoid copying the data?
So:
- We can create the static field for the
imgeach time the video capture begins, and free it when video capture stops. This could lead to the proplems when several threads will want to access this object, so i want a discussion to decide the best way to solve the issue. - We can pass the link to y,u,v arrays instead of copying the data - or no?
@mannol wrote the ToxAV code, so if it's an issue in toxcore, they're aware.
But, IIRC the issue of low FPS on video was something subliun was trying to work through. And the end result of that conversation was that Java is just too inefficient when it comes to converting YUV data on arm devices. So my guess at this point is the issue is more likely Antox's rather than inside toxcore.
Hey @GrayHatter, The java side is also written with bad memory allocation, which i managed to solve - I made the java side to reduce to ~20ms average per frame from 1900ms, but the native side is still ~600ms per frame. And it is almost the only problem which lowered the FPS dramatically - array allocations and copying.
I took a look at VideoSendThread thread implementation in Call.scala
https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/Call.scala
I made tests to check the FPS. Tested on LG L65 D285, sending video from Antox to qTox (Windows) with 640x480 resolution. Each frame was sent during 2500ms average, so the FPS was 0.4. The toxcore native side of code ToxSingleton.toxAv.videoSendFrame took about 500-600ms average from these 2500, the rest 1900ms took the scala methodsFormatConversions.nv21toYuv420. Concerning the java/scala side, I managed to save about 1800ms only by reducing the memory allocation for big arrays in FormatConversions.scala (allocating an array of 300k is quite a heavy operation, especially when working with camera preview, and especially on low-performance devices - at least it is always a bad practice to allocate any data all the time, when could be allocated once).
So the heaviest and longest operations here are in FormatConversions.scala: https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/FormatConversions.scala
val data = new Array[Byte](nv21.data.length)
- allocating over 400kBytes of
dataper frame only once instead of each frame saves ~500ms
Convert.rotateNV21(nv21.data, data, nv21.width, nv21.height, nv21.rotation)~700ms
- this one is really java inefficient, but the rotation could be performed on the client, not the sender
val y = data.slice(0, yLength)
val u = new Array[Byte](data.length / 6)
val v = new Array[Byte](data.length / 6)
-data.slice takes ~1200ms, but the use of System.arrayCopy saves almost 1195ms of 1200ms :)
- allocating
uandvonly once saves almost the rest ~100-200ms
while (i < data.length / 3) {
if (i % 2 == 0) {
v(i / 2) = data(yLength + i)
} else {
u(i / 2) = data(yLength + i)
}
- here replacing
i/2toi >> 1also saves a bit (~50ms, which is also essential for big FPS)
//////////////////////////////////////////////////// So, concluding, I managed to optimize a bit the code for the Java side of Antox which increases the FPS dramatically, but the native toxcore side still takes about 500ms per frame for 640x480 to encode and send. Considering the effect of heavy operations during big memory allocations per each frame, it is really necessary to handle it carefully also in the native side, despite the fact that the high-performance devices handle this faster. and despite the nature of C code comparing to Java - working with memory allocations of such a size needs to be as rare, as possible (especially when the size is big - so anytime working with video broadcasting, and at least - allocating memory every time, instead of doing it on video broadcasting start - is useless) ;)
@vassad There is a one big allocation (that I can think of) that can be removed (the one you mentioned), I'll try to do it today.
@mannol Thanks a lot! Will wait for your progress :) It is great to know that the project is such alive)
Here is my fix of this code: https://github.com/isotoxin/toxcore/blob/master/toxav/toxav.c#L786 Works fine for a long time
To tox devs: if you guys thing your code ok, tox will die.
Ah yes, that is the fix I'm talking about. @isotoxin Would you mind making a PR instead of me?
Don't you mind do it C99 style (with designators) initialization? It would be easier to read and maintain.
@isotoxin wonderful fix :) but why didn't you PR it for the the last 9 months since the fix? testing the solution perhaps)
why didn't you PR it for the the last 9 months since the fix
Cuz PR on this repo' are never merged ? You'd be better to propose the PR on TokTok/toxcore ;)
Why cant you fix it without PR? Somebody told me my fix is not compatible with old version of libvpx. Im not a linux programmer and I dont know how to resolve this issue. (Ok, I do not even want to think about this problem, because I hate linux way of library usage) You see my code, why you do not want to make toxcore a little better without any PR's? For the sake of the good in the world.
Fine.
Wow, chill out) Let's just do everything to make it finally be fixed :)
hey @mannol , the project failed to build due to the old VPX version installed on the Travis machine (Ubuntu 14.04, vpx 1.3.2) - that's why the .cs, .range, .bit_depth, .d_w, .r_w, .r_h, .x_chroma_shift, .y_chroma_shift and .fb_priv are stated as UNDEFINED FIELDS.
So, there are main four ways here:
- apply only fields, which are acceptable by 1.3.2 vpx version
- setup Travis to update the vpx version (and probably other libs versions) to the trunk
- move the allocation of the img to encode, using vpx api allocate method - from toxav.c to video.c (VCSession struct, where vpx encoder and decoder already leave), to the vc_reconfigure_encoder, vc_new, vc_kill methods.
- build.tox.chat has all the up-to-date libs. Is it possible to somehow create a job there, using your commit? And how to do it?:)
Do you mind image data copy?
A pull request will be soon, for the 3) way from @Mikhael-Danilov ...
I meant to use vpx_img_alloc - the way it is implemented right now in toxav.c - but not to allocate it all the time - allocate it (using this api method, which Travis won't blame) when the videocall starts, deallocate when it finishes. It is good practice the same time for the future also, and should work to avoid the BUILD FAILED on Travis and to successfully merge :)
And whant about the build.tox.chat? (Jenkins builder) How we can use it for solving and testing the issues?
Great! I'll close my PR then :^)
#1606
Make was successfull, tests success is no worse than the master toxcore results:
PASS: encryptsave_test PASS: messenger_autotest PASS: crypto_test PASS: network_test PASS: assoc_test FAIL: onion_test FAIL: TCP_test FAIL: tox_test FAIL: dht_autotest PASS: toxav_basic_test
Perhaps it should be merged?)
Actually it passes all tests (as master do) on my machine. But for some reason lots of tests fail on Travis...
Thank you all for your work!
@Mikhael-Danilov that's Travis for you
@GrayHatter Travis-ci
this one for master: https://travis-ci.org/irungentoo/toxcore/jobs/147799403
this one for my PR: https://travis-ci.org/irungentoo/toxcore/jobs/148003401
Or... This travis-ci builds&tests are deprecated and still mentioned in README.md by accident?