toxcore toxav.c causing low FPS due to bad memory allocation

Android tox client (Antox) is suffering from low FPS. Investigating this issue, i came to the native implementation of tox, which could faster up much the FPS.

toxav_video_send_frame method in toxav.c has quite heavy and unnecessary memory allocation for each frame to encode.

vpx_img_alloc(&img, VPX_IMG_FMT_I420, width, height, 0);

this one could be allocated only once, when the video capture is started.

memcpy(img.planes[VPX_PLANE_Y], y, width * height);
memcpy(img.planes[VPX_PLANE_U], u, (width / 2) * (height / 2));
memcpy(img.planes[VPX_PLANE_V], v, (width / 2) * (height / 2));

these could use the links to y,u,v arrays, instead of copying the data each frame. Or am i wrong, and vpx codec needs the y,u,v data to be side-by-side organized in memory, and we cannot avoid copying the data?

So:

We can create the static field for the img each time the video capture begins, and free it when video capture stops. This could lead to the proplems when several threads will want to access this object, so i want a discussion to decide the best way to solve the issue.
We can pass the link to y,u,v arrays instead of copying the data - or no?

Jul 14 '16 19:07 vassad

@mannol wrote the ToxAV code, so if it's an issue in toxcore, they're aware.

But, IIRC the issue of low FPS on video was something subliun was trying to work through. And the end result of that conversation was that Java is just too inefficient when it comes to converting YUV data on arm devices. So my guess at this point is the issue is more likely Antox's rather than inside toxcore.

Jul 14 '16 21:07 GrayHatter

Hey @GrayHatter, The java side is also written with bad memory allocation, which i managed to solve - I made the java side to reduce to ~20ms average per frame from 1900ms, but the native side is still ~600ms per frame. And it is almost the only problem which lowered the FPS dramatically - array allocations and copying.

I took a look at VideoSendThread thread implementation in Call.scala https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/Call.scala I made tests to check the FPS. Tested on LG L65 D285, sending video from Antox to qTox (Windows) with 640x480 resolution. Each frame was sent during 2500ms average, so the FPS was 0.4. The toxcore native side of code ToxSingleton.toxAv.videoSendFrame took about 500-600ms average from these 2500, the rest 1900ms took the scala methodsFormatConversions.nv21toYuv420. Concerning the java/scala side, I managed to save about 1800ms only by reducing the memory allocation for big arrays in FormatConversions.scala (allocating an array of 300k is quite a heavy operation, especially when working with camera preview, and especially on low-performance devices - at least it is always a bad practice to allocate any data all the time, when could be allocated once).

So the heaviest and longest operations here are in FormatConversions.scala: https://github.com/Antox/Antox/blob/master/app/src/main/scala/chat/tox/antox/av/FormatConversions.scala

val data = new Array[Byte](nv21.data.length)

allocating over 400kBytes of dataper frame only once instead of each frame saves ~500ms

Convert.rotateNV21(nv21.data, data, nv21.width, nv21.height, nv21.rotation) ~700ms

this one is really java inefficient, but the rotation could be performed on the client, not the sender

val y = data.slice(0, yLength)
val u = new Array[Byte](data.length / 6)
val v = new Array[Byte](data.length / 6)

-data.slice takes ~1200ms, but the use of System.arrayCopy saves almost 1195ms of 1200ms :)

allocating uand v only once saves almost the rest ~100-200ms

while (i < data.length / 3) {
if (i % 2 == 0) {
  v(i / 2) = data(yLength + i)
} else {
  u(i / 2) = data(yLength + i)
}

here replacing i/2 toi >> 1 also saves a bit (~50ms, which is also essential for big FPS)

//////////////////////////////////////////////////// So, concluding, I managed to optimize a bit the code for the Java side of Antox which increases the FPS dramatically, but the native toxcore side still takes about 500ms per frame for 640x480 to encode and send. Considering the effect of heavy operations during big memory allocations per each frame, it is really necessary to handle it carefully also in the native side, despite the fact that the high-performance devices handle this faster. and despite the nature of C code comparing to Java - working with memory allocations of such a size needs to be as rare, as possible (especially when the size is big - so anytime working with video broadcasting, and at least - allocating memory every time, instead of doing it on video broadcasting start - is useless) ;)

Jul 15 '16 08:07 vassad

@vassad There is a one big allocation (that I can think of) that can be removed (the one you mentioned), I'll try to do it today.

Jul 15 '16 10:07 mannol

@mannol Thanks a lot! Will wait for your progress :) It is great to know that the project is such alive)

Jul 15 '16 13:07 vassad

Here is my fix of this code: https://github.com/isotoxin/toxcore/blob/master/toxav/toxav.c#L786 Works fine for a long time

To tox devs: if you guys thing your code ok, tox will die.

Jul 15 '16 14:07 isotoxin

Ah yes, that is the fix I'm talking about. @isotoxin Would you mind making a PR instead of me?

Jul 15 '16 15:07 mannol

Don't you mind do it C99 style (with designators) initialization? It would be easier to read and maintain.

Jul 15 '16 15:07 Mikhael-Danilov

@isotoxin wonderful fix :) but why didn't you PR it for the the last 9 months since the fix? testing the solution perhaps)

Jul 15 '16 16:07 vassad

why didn't you PR it for the the last 9 months since the fix

Cuz PR on this repo' are never merged ? You'd be better to propose the PR on TokTok/toxcore ;)

Jul 15 '16 18:07 SkyzohKey

Why cant you fix it without PR? Somebody told me my fix is not compatible with old version of libvpx. Im not a linux programmer and I dont know how to resolve this issue. (Ok, I do not even want to think about this problem, because I hate linux way of library usage) You see my code, why you do not want to make toxcore a little better without any PR's? For the sake of the good in the world.

Jul 15 '16 22:07 isotoxin

Fine.

Jul 15 '16 22:07 mannol

Wow, chill out) Let's just do everything to make it finally be fixed :)

Jul 15 '16 22:07 vassad

hey @mannol , the project failed to build due to the old VPX version installed on the Travis machine (Ubuntu 14.04, vpx 1.3.2) - that's why the .cs, .range, .bit_depth, .d_w, .r_w, .r_h, .x_chroma_shift, .y_chroma_shift and .fb_priv are stated as UNDEFINED FIELDS.

So, there are main four ways here:

apply only fields, which are acceptable by 1.3.2 vpx version
setup Travis to update the vpx version (and probably other libs versions) to the trunk
move the allocation of the img to encode, using vpx api allocate method - from toxav.c to video.c (VCSession struct, where vpx encoder and decoder already leave), to the vc_reconfigure_encoder, vc_new, vc_kill methods.
build.tox.chat has all the up-to-date libs. Is it possible to somehow create a job there, using your commit? And how to do it?:)

Jul 18 '16 17:07 vassad

Do you mind image data copy?

Jul 18 '16 17:07 mannol

A pull request will be soon, for the 3) way from @Mikhael-Danilov ... I meant to use vpx_img_alloc - the way it is implemented right now in toxav.c - but not to allocate it all the time - allocate it (using this api method, which Travis won't blame) when the videocall starts, deallocate when it finishes. It is good practice the same time for the future also, and should work to avoid the BUILD FAILED on Travis and to successfully merge :)

And whant about the build.tox.chat? (Jenkins builder) How we can use it for solving and testing the issues?

Jul 18 '16 18:07 vassad

Great! I'll close my PR then :^)

Jul 18 '16 18:07 mannol

#1606

Jul 18 '16 18:07 Mikhael-Danilov

Make was successfull, tests success is no worse than the master toxcore results:

PASS: encryptsave_test PASS: messenger_autotest PASS: crypto_test PASS: network_test PASS: assoc_test FAIL: onion_test FAIL: TCP_test FAIL: tox_test FAIL: dht_autotest PASS: toxav_basic_test

Perhaps it should be merged?)

Jul 18 '16 19:07 vassad

Actually it passes all tests (as master do) on my machine. But for some reason lots of tests fail on Travis...

Aug 03 '16 14:08 Mikhael-Danilov

Thank you all for your work!

Aug 03 '16 18:08 im-grey

@Mikhael-Danilov that's Travis for you

Aug 03 '16 18:08 GrayHatter

@GrayHatter Travis-ci

this one for master: https://travis-ci.org/irungentoo/toxcore/jobs/147799403

this one for my PR: https://travis-ci.org/irungentoo/toxcore/jobs/148003401

Or... This travis-ci builds&tests are deprecated and still mentioned in README.md by accident?

Aug 04 '16 11:08 Mikhael-Danilov