node-elgato-stream-deck icon indicating copy to clipboard operation
node-elgato-stream-deck copied to clipboard

performance tips?

Open ahmadnassri opened this issue 4 years ago • 4 comments

I'm working on creating a customized management configuration and buttong mapping for my personal use, one of the challanges i've noticed is around performance / speed of rendering / drawing of buttons in comparision to other libraries / tooks (e.g. streamdeck_ui in python)

the workflow i'm following is failry straight forward:

  1. set button images using sharp + deck.renderButton as per examples
  2. listen to button events and execute actions with deck.on('down') events
  3. allow for "multi page" views where the entire panel gets re-rendered with new buttons on certain actions (e.g. previous, next page)

it's in (3) that I'm noticing the performance difference, having to re-render the entire panel (usually by looping through the button list and rendering each) is noticiably slow, even when attempting to run those commands asynchronously ...

any advise on how to render entire screen of buttons more rapidly and achieve similar results to the python library?

ahmadnassri avatar Dec 14 '19 21:12 ahmadnassri

Performance isn't something I have thought too much about as it is generally good enough, but I agree it does feel a bit sluglish when wanting rapidly filling the entire panel of the xl.

I will need to do some tests, but I suspect it could be caused by some of:

  • node-hid is blocking, which could mean usb transfer is suboptimal.
  • jpeg encoding for gen2 devices
  • inefficient buffer operations to transform the image to what each device needs

I don't know when I will have time to look into this, if you are able to do some profiling to figure out which step is being slow that would be really helpful

Julusian avatar Dec 14 '19 22:12 Julusian

For one of our projects we stumbled over the same issue and did some testing. The performance is lost during buffer manipulation and jpeg encoding. With a kinda hacky approach we managed to circumvent that issue by abusing internal methods to cache hid packets sent to the device directly. Thereby we are able to update all keys near-instant:

const sharp = require('sharp');
const path = require('path');
const { openStreamDeck } = require('elgato-stream-deck');

(async function() {
   const streamdeck = openStreamDeck();
   streamdeck.clearAllKeys();

   const imageBuf = await sharp(path.join(__dirname, 'img.png'))
      .resize(streamdeck.ICON_SIZE, streamdeck.ICON_SIZE)
      .flip()
      .flop()
      .jpeg()
      .toBuffer();

   /* cache packets for each image you want to rerender */
   const packetsCache = {};
   packetsCache['img.png'] = streamdeck.generateFillImageWrites(0, imageBuf);

   for (let keyIndex = 0; keyIndex < streamdeck.NUM_KEYS; keyIndex++) {
      packetsCache['img.png'].forEach(packet => {
         /* keyIndex is 3rd byte of each packet */
         packet[2] = keyIndex;
         streamdeck.device.write(packet);
      });
   }

   streamdeck.close();
})();

We load every icon in advance, transform them to hid packets and cache those. To render an image, we just need to find the appropriate packet list, update the keyIndex byte and send it to device directly.

We only tested this with XL and OriginalV2 devices so far, but i'm expecting original device to fail due to keyIndex transformation and raw pixel usage instead of jpeg. But that should be easy to adapt.

Maybe it is possible to integrate a similar, cross-device approach directly into the api?

tricora avatar Dec 15 '19 02:12 tricora

@ahmadnassri You got me curious with this, so I've had a bit of a look. This is on an i7 laptop with an xl, so your experience may differ.

Profiling with examples/fill-panel-when-pressed.js shows that doing the node-hid writes is by far the slowest part. Sending the grass image with on my xl it spends 130ms of 220ms doing the calls to hid write, but this varies massively based on content for gen2 devices.

I did find an optimisation of the image conversion loop that brings it down to ~10ms for the full panel, before I was seeing about 90ms. This leaves an overhead of 20ms per full panel fill, as this is cpu bound perhaps it could be done in parallel in worker threads.

A little further digging shows that node-hid doesnt accept buffers, so we have to convert buffers to number arrays, for node-hid to then convert it back to bytes while verifying every item in the array is a number. If I modify node-hid to skip that conversion and accept buffers, that takes it from 120-150ms down to 70-100ms.

So with all this, this test case is reduced from ~220ms down to ~100ms. There is still a visible rolling effect to it, but as most of that time remains in node-hid blocking calls its going to be hard to go too much further

Julusian avatar Dec 15 '19 13:12 Julusian

@tricora That is interesting. My testing was showing jpeg encoding taking a few ms (when using jpeg-turbo instead of jpeg-js), so not a noticable time.

I am hesistant to add a cache of any kind, as managing that cache will be hard to get right and everyone will want something different. I would be open to adding some hooks to allow for you to implement your caching without needing to abuse internal methods.

Julusian avatar Dec 15 '19 13:12 Julusian