NeoPixelBus icon indicating copy to clipboard operation
NeoPixelBus copied to clipboard

Alternative serial send encoding using a smaller send buffer

Open Makuna opened this issue 3 years ago • 1 comments

Today, hardware is used in several methods that relies on using serial peripheral support, like UART, I2S, or SPI. These encode the pulse of a single data bit as four serial bits for the period. This requires a send buffer of 4x the size of the pixel data to send the data.

0b1000 => a 1 bit pulse 0b1110 => a 0 bit pulse

This allows the timing to define the total pulse period and then with 3 relative positions that would define the pulse width. These positions would be at 25%, 50%, and 75% of the period. We ignore 0% and 100% as they won't generate pulses.

This fits many of the NeoPixel pulse data protocol timing well.
But there are cases in newer LEDs that require more precise timing that is based on 1/3rd of the cycle rather than 1/4th.

An alternate to supplement but not replace the above is to use 3 serial bits. This would also require only 3x the buffer size for the pixels.

0b100 => a 1 bit pulse 0b110 => a 0 bit pulse

This allows the timing to define the total pulse period and then with 2 relative positions that would define the pulse width. These positions would be at 33% and 66% of the period. We ignore 0% and 100% as they won't generate pulses.

Encoding will either require a larger lookup table of 256 entries of 24bits; or a function that is run for each byte to encode it into 3 send buffer bytes. That function would be like the following where the returned values lower 24 bits would then copied to the send buffer ...

uint32_t Encode3Bit(uint8_t data) { 
    uint32_t value = 0;
    for (uint8_t bitMask = 0x80; bitMask != 0; bitMask >>= 1)     {
        value <<= 3;
        if (data & bitMask) {
            value |= 0b100;
        }
        else {
            value |= 0b110;
        }
    }
    return value;
}

Makuna avatar Feb 18 '21 21:02 Makuna

Test Sketch for timing and some results... Original model using 4bits per bit took around 200us to convert. Using the function to convert a bit to 3 bits took around 1000us to convert (5x!) Using the big lookup table to convert a bit to 3 bits to around 190us to convert.

A strip of 256 pixels of RGB will break even on memory for the large lookup table versus data saved in the output buffer. Below are the times it takes to convert using the function versus the large lookup table over a range of pixels; it seems rather linear.

128 pixels = 533us versus 97us lookup 64 pixels = 260us versus 50us lookup 32 pixels = 130us versus 25us lookup; 16 pixels = 66us versus 13us lookup

const uint8_t DebugPin = 4; 

uint8_t data[248*3]; // simulate a 248 long pixel strip of data
uint8_t dataOut[248*3*4]; // 4 bits per data bit output send buffer 

uint8_t lookup3Bit[256*3]; // 256 entries of 24 bits

#define _countof(a) (sizeof(a) / sizeof(a[0]))


void buildLookup() 
{
  uint8_t* lookup = lookup3Bit;
  uint8_t* lookupEnd = lookup + _countof(lookup3Bit);
  uint8_t source = 0;
  
  while (lookup < lookupEnd) {
      uint32_t value = 0;
      for (uint8_t bitMask = 0x80; bitMask != 0; bitMask >>= 1)     {
          value <<= 3;
          if (source & bitMask) {
              value |= 0b100;
          }
          else {
              value |= 0b110;
          }
      }

      *lookup++ = (value >> 16) & 0xff;
      *lookup++ = (value >> 8) & 0xff;
      *lookup++ = (value & 0xff);
      source++;
  }
}

void setup() {
  // put your setup code here, to run once:
    pinMode(DebugPin, OUTPUT);
    digitalWrite(DebugPin, LOW);

    // fill data with known info
    for (size_t dataIndex = 0; dataIndex < _countof(data); dataIndex++) {
      data[dataIndex] = (uint8_t)(0x01 << (dataIndex % 8));
    }
}

    void Fill4BitBuffer()
    {
        const uint16_t bitpatterns[16] =
        {
            0b1000100010001000, 0b1000100010001110, 0b1000100011101000, 0b1000100011101110,
            0b1000111010001000, 0b1000111010001110, 0b1000111011101000, 0b1000111011101110,
            0b1110100010001000, 0b1110100010001110, 0b1110100011101000, 0b1110100011101110,
            0b1110111010001000, 0b1110111010001110, 0b1110111011101000, 0b1110111011101110,
        };

        uint16_t* pDma = reinterpret_cast<uint16_t*>(dataOut);
        uint8_t* pEnd = data + _countof(data);
        for (uint8_t* pPixel = data; pPixel < pEnd; pPixel++)
        {
            *(pDma++) = bitpatterns[((*pPixel) & 0x0f)];
            *(pDma++) = bitpatterns[((*pPixel) >> 4) & 0x0f];
        }
    }

    void Fill3BitBuffer()
    {
        uint8_t* pDma = dataOut;
        uint8_t* pEnd = data + _countof(data);
        for (uint8_t* pPixel = data; pPixel < pEnd; pPixel++) {
          uint32_t value = 0;
          for (uint8_t bitMask = 0x80; bitMask != 0; bitMask >>= 1)     {
              value <<= 3;
              if (*pPixel & bitMask) {
                  value |= 0b100;
              }
              else {
                  value |= 0b110;
              }
          }
          *pDma++ = (value >> 16) & 0xff;
          *pDma++ = (value >> 8) & 0xff;
          *pDma++ = (value & 0xff);
        }
    }

    void Fill3BitBufferLookUp()
    {
        uint8_t* pDma = dataOut;
        uint8_t* pEnd = data + _countof(data);
        for (uint8_t* pPixel = data; pPixel < pEnd; pPixel++) {
          
          uint16_t index = *pPixel * 3;
          *pDma++ = lookup3Bit[index++];
          *pDma++ = lookup3Bit[index++];
          *pDma++ = lookup3Bit[index++];
        }
    }
    
void loop() {
  // put your main code here, to run repeatedly:
    digitalWrite(DebugPin, HIGH);
    Fill4BitBuffer();
    digitalWrite(DebugPin, LOW);

    digitalWrite(DebugPin, HIGH);
    Fill3BitBuffer();
    digitalWrite(DebugPin, LOW);

    digitalWrite(DebugPin, HIGH);
    Fill3BitBufferLookUp();
    digitalWrite(DebugPin, LOW);
    
    delay(66);
}

Makuna avatar Feb 18 '21 23:02 Makuna