bitstring bitstring.Array('uintle12')?

TLDR

😎 Nice library.💡 Supporting bitstring.Array("uintle12") for my data array would be useful. 🙏

Version: bitstring==4.3.0 Urgency: non-blocking, as I figured out an inefficient workaround to swap endianness BE<->LE manually, and although it has some limitations and is not fully general, my data happens to align nicely to those constraints.

Problem

I have some 12-bit graphics data stored in little endian layout (along with other occurrences found in the wild like FAT-12 tables stored on floppy disk images) for which I tried this library for yesterday, but alas bitstring.Array appears to only support big endian layout for 12-bit data 🤔, as the raw byte data for "uint12" on my x86 machine from bitstring.Array's tobytes is clearly big endian (like TCP/IP field layout), where the first element's 8 MSB's are stored in byte[0], the 4 LSB's are stored in the high nibble of byte[1], then the second element's 4 MSB's in the low nibble of byte[1], and the 8 LSB's in byte[2]:

Desired little-endian element layout 🙂:

Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       [---------------00----------------] [---------------01----------------] [---------------02----- ...
Bit of element:      00 01 02 03 04 05 06 07 08 09 10 11 00 01 02 03 04 05 06 07 08 09 10 11 00 01 02 03 04 05 06 07 ...

Actual element layout of "uint12", which is big-endian 🙃:

Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       (---------00----------] (---01----] [---00----) [---------01----------) (---------02----------] ...
Bit of element:      04 05 06 07 08 09 10 11 08 09 10 11 00 01 02 03 00 01 02 03 04 05 06 07 04 05 06 07 08 09 10 11 ...

Tried

bitstring.Array("uintle12") yields ValueError: Inappropriate Dtype for Array: 'uintle12'
bitstring.Array("<uint12"); yields ValueError: Inappropriate Dtype for Array: '<uint12'.
bitstring.options.lsb0 = True just seems to reverse the element direction while still keeping the actual layout across bytes in BE.

Feature request

Please add dtypes for uintle12 (plus uintbe12 for symmetry as an alias of the current uint12) when/if you have time.

Additionally I have image data in LE 2-bits-per-pixel and 4-bits-per-pixel that would be nice to work with, but my 4bpp image array...

...instead looks like:

Supporting "uintle4" and "uintle2" would remedy that:

->

Mathematically for 2bpp:

pixel[0] = (byte[0] >> 0) & 0x03 // bits 0..2
pixel[1] = (byte[0] >> 2) & 0x03 // bits 2..4
pixel[2] = (byte[0] >> 4) & 0x03 // bits 4..6
pixel[3] = (byte[0] >> 6) & 0x03 // bits 6..8
pixel[4] = (byte[1] >> 0) & 0x03 // bits 8..10
pixel[5] = (byte[1] >> 2) & 0x03 // bits 10..12
pixel[6] = (byte[1] >> 4) & 0x03 // bits 12..14
pixel[7] = (byte[1] >> 6) & 0x03 // bits 14..16
...
pixel[i] = (byte[i >> 2] >> (i * 2 & 0x07)) & 0x03
______________________________________________________________________________________________________________

Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       [00 ] [01 ] [02 ] [03 ] [04 ] [05 ] [06 ] [07 ] [08 ] [09 ] [10 ] [11 ] [12 ] [13 ] [14 ] [15 ] 
Bit of element:      00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 ...

[!NOTE]
While older EGA/VGA-based images used BE layout (where the low-index pixel was in the high-index bits, and the high-index pixel was the low-index bits), newer formats on video game consoles and machine learning tensors follow the convention that higher index elements are stored in higher bits.

This actually applies to any element bit size that isn't a multiple of 8, where BE (TCP/IP) fills from MSB to LSB, and LE (x86) fills bits from LSB to MSB, meaning that even oddities like uintle3 or uintle5 should work consistently too, but for my needs, 2-bit, 4-bit, and 12-bit are most important.

Workarounds

Since endianness can be thought more generically as a mapping between logical bit indices and actual bit indices (not simply "how bytes are arranged in a word"), then it's possible to transform between endianness (LE <-> BE) by reversing the direction of all the elements and reversing the direction of all the bytes. So currently I call swapBE8toLE8 before serializing back out to the file, but it would be nicer to handle this directly in-place with the array via direct "uintle#" support (then no extra memory rewrites/copies, potential forgetfulness as you pass the array around to other parts of the program, or boundary condition issues like when the total bits count is not a multiple of 8).

swapBE8toLE8(outputArray)
...

def swapBE8toLE8(array : bitstring.Array):
    if (array.itemsize % 8) == 0:
        # Faster shortcut for byte-size elements which work directly.
        # (but the "else" branch below would work too).
        array.byteswap()
    else:
        # Slower work-around to swap endianness layout for non-byte multiples.
        # This still has the constraint that the total bit count must be a multiple of 8
        # because otherwise the byte reversal fails because of the fractional trailer,
        # whereas a direct implementation would not have that issue.
        originalDtype = array.dtype
        array.reverse()
        array.dtype = "uint8"
        array.reverse()
        array.dtype = originalDtype
    #endif
#endif

def swapLE8toBE8(array : bitstring.Array):
    if (array.itemsize % 8) == 0:
        # Faster shortcut for byte-size elements which work directly.
        # (but the "else" branch below would work too).
        array.byteswap()
    else:
        # Slower work-around to swap endianness layout for non-byte multiples.
        # This still has the constraint that the total bit count must be a multiple of 8
        # because otherwise the byte reversal fails because of the fractional trailer,
        # whereas a direct implementation would not have that issue.
        originalDtype = array.dtype
        array.dtype = "uint8"
        array.reverse()
        array.dtype = originalDtype
        array.reverse()
    #endif
#endif

[!NOTE]
These two functions are distinct, and you can't just call swapBE8toLE8 a second time on the same data to reverse it, because permuting LE to either BE8 or BE16 (so-called "middle endian") isn't always the same as unpermuting back to little endianness. They are notably symmetric when the element bit size is a multiple of the minimal address unit size (which is 8 bits on most architectures, or 16 bits on a few oddities like the NUXI PDP-11), and so calling swapBE8toLE8 twice on the same array restores the original data then.

[!IMPORTANT]
I've often seen this belief that endianness is purely an architectural hardware trait of how bytes are arranged within a given word unit, but this isn't a complete picture. When you think about units that straddle across bytes (and read architectural diagrams for TCP/IP or documents like the GenICam Pixel Format Naming Convention), you realize that endianness indirectly also implies the direction bitfields flow within and across each byte, because it makes the most sense (if you want any reasonable efficiency without a bunch of bit slicing and masking/or'ing, especially when reading larger word units than bytes and progressively shifting bits) for BE architectures to store fields MSB->LSB and LE to store LSB->MSB.

https://github.com/scott-griffiths/bitstring/issues/156 feels distinct, as this one is about adding a dtype to bitstring.Array, and bitstring.options.lsb0 = True doesn't solve the issue anyway.
https://github.com/scott-griffiths/bitstring/issues/41 is about BitString adding intle and is marked completed, but this is about bitstring.Array.
https://github.com/scott-griffiths/bitstring/issues/210 might be the same the issue, but I've supplied much more information, hopefully enough that it's clear.

🫡 Thanks from Redmond Washington.

Jan 28 '25 05:01 fdwr

Hi. Thanks for the very thorough feature request :)

Fundamentally the big/little endian features only work on whole-byte data. That's because all they do is determine which order to parse the bytes in. For a 12 bit data type it's not well determined what should happen.

From your description I think that there is an intermediate packing / unpacking stage here. So the 12-bit numbers have been concatenated, forming 3 bytes for every pair of numbers. These bytes have then been (byte) stored in 32-bit little endian words, which has scrambled the numbers a bit.

I had a quick go at trying to reproduce the order, but didn't quite get it...

>>> v = [250, 251, 252, 253, 254, 255]
>>> b = pack('6*uint12', *v)
>>> b
BitStream('0x0fa0fb0fc0fd0fe0ff')
>>> b.byteswap(4)
2
>>> b
BitStream('0x0ffba00fe00ffdc0ff')  # Not quite the order, but it's the same sort of thing?

For your 2 bit example, it looks to me like you could just use an Array and reverse the result.

>>> a = Array('u2', Bits('0b00011011'))
>>> a
Array('uint2', [0, 1, 2, 3])
>>> a.reverse()
>>> a
Array('uint2', [3, 2, 1, 0])
>>> a.data.bin
'11100100'

Possibly I've misunderstood, but things like uintle12 just don't make sense as far as I can see. I just wouldn't know how to decode one!

Cheers.

Jan 31 '25 15:01 scott-griffiths

Possibly I've misunderstood

🤔 Or it's my failure to elucidate. I'm confident (having parsed multiple binary formats over the decades and being familiar with low-level architectural details of various hardware) that LE non-byte-aligned array layouts are very well-determined and pretty common. I added doc references and examples below, and I'll also try to code up a commit in a fork to demonstrate LE support, but it will have to be next week... ⌛ If you want to skip reading the rest of this until that's ready, feel free to 😅.

Fundamentally the big/little endian features only work on whole-byte data.

bitstring.Array already supports non-multiple-of-8-bits field sizes correctly for big endian layout (dtype uint12 is actually uintbe12), but only big endian - it's just missing its logical counterpart. 😉

Memory layout documentation

[!NOTE]
All bit indices below follow the predominant convention where setting bit 0 of byte x is equivalent to x |= (1<<0), and setting bit 5 is equivalent to x |= (1<<5)). So bit 0 is always the LSb, regardless of endianness or how bits may be physically stored/transmitted over the wire. Diagrams in the TCP/IP packet header though sometimes use retrograde numbering, which is royally confusing mathwise (because setting bit 7 via 1<<7 actually sets bit "0" 🙃).

Non-multiple-of-byte-size elements are found in...

Element bit size	LE	BE
1-bit	CAIRO_FORMAT_A1_LE SDL SDL_PIXELFORMAT_INDEX1LSB Qt_QImage_Format_MonoLSB DirectWrite bilevel glyphs	CAIRO_FORMAT_A1_BE SDL SDL_PIXELFORMAT_INDEX1MSB Qt_QImage_Format_Mono Windows DIB's
2-bit	SDL SDL_PIXELFORMAT_INDEX2LSB Nintendo Virtual Boy grfx	SDL SDL_PIXELFORMAT_INDEX2MSB Windows CE DIB's
4-bit	Nintendo Gameboy Advance grfx DirectML `DML_TENSOR_DATA_TYPE_INT4`	Sega Genesis grfx Atari Lynx grfx
6-bit	DirectWrite ClearType on x86	DirectWrite ClearType on Xbox 360
12-bit	FAT-12 MonoPacked12 LE	MonoPacked12 BE

12-bit LE is used in the FAT-12 allocation table:

"The FAT12 file system uses 12 bits per FAT entry, thus two entries span 3 bytes. It is consistently little-endian: if those three bytes are considered as one little-endian 24-bit number, the 12 least significant bits represent the first entry (e.g. cluster 0) and the 12 most significant bits the second (e.g. cluster 1)."

12-bit LE is also used in camera image data, where the elements (when LE Mono12Packed) start from the low index byte and lowest index bit:

"... the data is filled lsb first in the lowest address byte (byte 0) starting with the first component and continue in the lsb of byte 1 (and so on). ... byte 0 contains the least significant bits of the first color component. We start filling data with the lsb of byte 0 and continue with the lsb of byte 1 (and so on). ... Notice that bits are put successively for each component with no spacing in-between"

See this code for monochrome FLIR cameras:

It is a 12-bit format with its bit-stream following the bit packing method illustrated in Figure 3. The first byte of the packed stream contains the eight least significant bits (lsb) of the first pixel. The third byte contains the eight most significant bits (msb) of the second pixel. The four lsb of the second byte contains four msb of the first pixel, and the rest of the second byte is packed with the four lsb of the second pixel.

    for (i=0; i<numPixels/2; i++) {
        *output++ = (*input << 4) | ((*(input+1) & 0x0f) << 12);
        *output++ = (*(input+1) & 0xf0) | (*(input+2) << 8);
        input += 3;
    }

The GBA (LE) and Sega Genesis (BE) had similar 8x8 tiles, but their hardware had different endiannesses, and their images were correspondingly nibble swapped.
SDL supports pixel formats for both LE and BE for 2bpp and 1bpp.
DirectWrite (a Windows text rendering API) had a 6bpp format for ClearType, which stored (for efficiency's sake) the arrays in LE layout on LE machines like x86, and BE layout on those old BE machines like the Xbox 360 - in both cases, 6-bit units straddled across bytes.

Examples

Given an array of 12-bit values [0x321, 0x654, 0x987, 0xCBA], the four elements would yield these bytes in memory:

Element index	Value	Logical bit range	LE layout bytes	BE layout bytes
element[0]	`0x321`	`0..12`	`21 _3 __ __ __ __`	`32 1_ __ __ __ __`
element[1]	`0x654`	`12..24`	`__ 4_ 65 __ __ __`	`__ _6 54 __ __ __`
element[2]	`0x987`	`24..36`	`__ __ __ 87 _9 __`	`__ __ __ 98 7_ __`
element[3]	`0xCBA`	`36..48`	`__ __ __ __ A_ CB`	`__ __ __ __ _C BA`
As 48-bit word		`0..48`	`0xCBA987654321`	`0x321654987CBA`

These bytes have then been (byte) stored in 32-bit little endian words, which has scrambled the numbers a bit.

I just used the 32-bit dword for comparison - no additional unpacking occurs (at least, no more/less than with BE layout). Using uint32 is useful for efficiency of reads too, especially when reading multiple elements in sequence so you can barrel shift each unit, but it works using individual byte reads and shift's/or's too.

For your 2 bit example, it looks to me like you could just use an Array and reverse the result.

Reversing the whole array would fix each chunk of 4-pixel columns, but it would also horizontally and vertically flip the image data.

I had a quick go at trying to reproduce the order, but didn't quite get it...

Yeah, byteswap only works when the element bitsize is a multiple of 8. swapBE8toLE8 would achieve it.

C++ code to print LE or BE arays

I wrote a little C++ program (it's my most familiar language) that prints out logical values from two arrays stored in memory as LE and BE layout.

// Prints:
// LE data: 321,654,987,CBA,
// BE data: 321,654,987,CBA,

#include <stdint.h>
#include <limits.h>
#include <print>
#include <bit> // Uses C++23 for std::byteswap.
#include <span>
#include <assert.h>

constexpr size_t elementBitSize = 12;
constexpr size_t elementCount = 4;
constexpr size_t byteCount = elementCount * elementBitSize / CHAR_BIT;

const std::uint8_t elementsLittleEndian[byteCount + 3] = {0x21, 0x43, 0x65, 0x87, 0xA9, 0xCB};
const std::uint8_t elementsBigEndian[byteCount    + 3] = {0x32, 0x16, 0x54, 0x98, 0x7C, 0xBA};

void PrintElementsOfGivenBitsize(
    std::span<uint8_t const> data,
    size_t elementCount,
    uint32_t elementBitSize,
    std::endian endianness
)
{
    // Limitations of this simple function:
    // - Supports 1-24 bit reads (larger sizes would need more shifting/or'ing logic).
    // - Data must have 3 padding zero bytes at the end to read 32-bit chunks (a more complete
    //   function should handle trailing bytes, but this is a simple example).
    assert(elementBitSize <= 24); // Function supports 1-24 bit reads.
    assert(data.size() >= ((elementCount * elementBitSize + CHAR_BIT - 1) / CHAR_BIT) + 3);

    const uint32_t elementBitMask = (1 << elementBitSize) - 1;
    bool isReversedEndian = (endianness == std::endian::big);

    for (size_t i = 0; i < elementCount; ++i)
    {
        // Both little endian and big-endian share these lines:
        uint32_t elementValue = *reinterpret_cast<uint32_t const*>(&data[i * elementBitSize / CHAR_BIT]);
        uint32_t bitOffsetModulus = (i * elementBitSize) % CHAR_BIT;

        uint32_t rightShift = bitOffsetModulus;
        // Big endian requires some massaging on an x86.
        if (isReversedEndian)
        {
            rightShift = sizeof(elementValue) * CHAR_BIT - elementBitSize - bitOffsetModulus;
            elementValue = std::byteswap(elementValue);
        }

        elementValue = (elementValue >> rightShift) & elementBitMask;
        std::print("{:X},", elementValue);
    }
}

int main()
{
    std::print("LE data: ");
    PrintElementsOfGivenBitsize(elementsLittleEndian, elementCount, elementBitSize, std::endian::little);
    std::println();

    std::print("BE data: ");
    PrintElementsOfGivenBitsize(elementsBigEndian, elementCount, elementBitSize, std::endian::big);
    std::println();
}

Python code

I just wouldn't know how to decode one!

I'll help!

TODO: ⌛

Cheers back 👋

Feb 01 '25 12:02 fdwr

The thing to consider is that a u12, as well as all the other dtypes in bitstring, is referring to a contiguous series of bits, and can exist on its own. The problem with the little endian u12 is that it can only exist inside an array and needs extra information to decode.

The link to FAT12 was interesting - it basically says to concatenate two u12 and then byte reverse the three bytes. So each 3 bytes are stored little endian and there are two stages to unpack the data. While it would be possible to have a type that did all this automatically when inside an array, it's not simply a uintle12 type, as it wouldn't have the extra information it would need about swapping every three bytes.

With bitstring right now, to encode a series of ints in something like the FAT12 format you could encode each pair as uint12, use byteswap on the 3 byte result and concatenate each set of 3 bytes. To decode just reverse the process.

If you had a different format that, for example, stored each set of 16 uint12 inside 24 bytes then it would be a slightly different process, but you definitely couldn't use the same uintle12 type from FAT12. Note that the wiki page for FAT12 didn't just say they were 12 bit little endian numbers but also had to be very explicit about how each pair were packed into 3 bytes

So when I said the "I just wouldn't know how to decode one!", I think that should be taken literally, and consider what a single little endian uint12 would be. Basically it can't exist as it isn't stored contiguously, and while yes, inside an array it can make sense, I think any type specification would be more complex than just doing the byteswap and decode in two steps.

Feb 01 '25 19:02 scott-griffiths

I am taking a look at this in the design of data types for another library I'm working on (bitformat, which could be considered to be bitstring version 5 but it's divergent enough that it will probably stay separate).

For the FAT12 example the data type would be something like '(u12, u12)_le' when formatted as a string, which I think looks reasonably good. You could then pack some data into an array like

a = bitformat.Array('(u12, u12)_le', [[1, 2], [3, 4], [5, 6]])

and it should all work. This probably won't get back ported into bitstring as most of my efforts are on bitformat right now, but it does look like a good feature.

Feb 04 '25 11:02 scott-griffiths

Yo Scott. I haven't had much time yet to prototype a change locally, besides downloading PyCharm last night and debugging some, but alas I got lost in the inheritance hierarchy trying to figure out where the actual backing store is 😅. Presumably though the underlying representation is ultimately just a growable buffer of bytes somewhere, and the dtypes and [] operator map a view over that?

I did create several illustrations on the train ride that might help visualize more, along with a tiny C++library with 2 little functions to read/write a contiguous string of LE or BE bits 1-32 long (see ReadBitString function with the ASCII art). Hopefully the diagrams demonstrate that LE fields straddling across bytes are definitely bitwise contiguous, and exist on their own independently outside arrays - it's just a matter of how you interpret the bit sequencing.

I am taking a look at this in the design of data types for another library I'm working on

Cool, a native foundation will likely improve efficiency, and if it supports more features, that's goodness too.

Visualizing the order

FAT12 ... had to be very explicit about how each pair were packed into 3 bytes

I wouldn't fixate on that red herring, as that's just coincidental synchronization from the least common multiple of 8 and 12 being 24, and there's nothing specific about 3-byte tuples, as you could pluck an isolated uintle12 out from memory given an arbitrary bit offset (0, 5, 67, 12, 24) and in many cases just a 2-byte read. The FAT-12 spec is precise for clarity's sake, not because packing is necessary, as generic LE bit fills naturally gives you that layout anyway. So rather than focus on a bitsize of 12 (or other convenient ones like 2 or 4), it's probably more enlightening to solve it for odd bitsize elements like 3, 7, or 13 bits (where 13 and 8 have a much higher LCM of 104 bits), and once the epiphany is realized for say 7-bits, then all the other bit sizes just fall into place too (including ones that are a multiple of 8 bits and already work in bitstring, which are subset of the more general case). e.g. Given an array of alternating 7-bit elements with all 0's (0x00) and all 1's (0x7F)...

LE 7-bit data @0: 0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,...
       as binary: 0000000,1111111,0000000,1111111,0000000,1111111,0000000,...
        as bytes: 80,3F,E0,0F,F8,03,FE,80,3F,E0,0F,F8,03,FE,80,3F,E0,0F,F8,...

BE 7-bit data @0: 0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,7F,0,...
       as binary: 0000000,1111111,0000000,1111111,0000000,1111111,0000000...
        as bytes: 01,FC,07,F0,1F,C0,7F,01,FC,07,F0,1F,C0,7F,01,FC,07,F0,1F,...

...for little endian, the first element (0x00) is assigned into the first byte's first 7 bits (bits 0-6). Then the element 1 (0x7F) assigns its first bit (bit 0 the LSb) to byte 0's remaining unused bit (bit 7 the MSb, forming 0x80) and the next 6 bits into byte 1's bits 0-5 (forming 0x3F). Next element 2 (0x00) stuffs 2 bits into byte 1's remaining upper bits 6-7 and 5 bits into byte 2's bits 0-4...

consider what a single little endian uint12 would be

First, what would a single uintle7 be? Presuming there's a growable buffer of bytes underlying bitstring, then a single 7-bit LE unit would be a single byte with the low 7 bits filled in and an ignorable MSb, whereas a single uintbe7 would be a single byte with the high 7 bits filled in and an ignorable LSb. Then a single uintle12 would be byte 0 filled with bits 0-7 and byte 1 filled with bits 8-11, then 4 ignorable MSb's. If bitstring's internal representation is instead a series of 1's and 0's (a real string of bits), then the representation for both 7-bit LE and BE could be stored identically, or LE could be stored with the LSb first (bit index 0), but either way, it's only when reading in/writing out octets that the difference matters. So long as the bytes inbound (fromfile or bytes to the constructor) and bytes outbound (tobytes or tofile) work as expected, I'm happy :b.

needs extra information to decode

BitArray has a dtype property - isn't the LE or BE endianness of the dtype the only property needed to interpret the bytes in the buffer?

Fun facts

Certain hardware had variable-length field reading, like the Super Nintendo SA-1 chip's barrel shifter circuit that would sequentially read a string of bits (1-16 bits long) in little-endian order, which saved the main CPU from needing to shift and OR across byte boundaries - really useful for bit-rate reduction, like potentially reading pixels that only needed 3-bits-per-pixel (saving 25% space over nibbles) or tilemaps that only needed 12-bits (saving 25% space over 16 bits).

Test Data

I attached some test data (see Image-l7-le.bin) if it helps, and show some sample code below to read 7-bit LE data:

📂🔢 EndiannessTestData.zip

(pictured is an old tiny tool I wrote to display data arrays, bit sizes 1-32, LE or BE)

import bitstring
#bitstring.options.lsb0 = True # Doesn't appear to make a difference here? 🤷‍♂️

dataLe = bitstring.Bits('0x80,0x3F,0xE0,0x0F,0xF8,0x03,0xFE,0x80,0x3F,0xE0,0x0F,0xF8,0x03,0xFE')
dataBe = bitstring.Bits('0x01,0xFC,0x07,0xF0,0x1F,0xC0,0x7F,0x01,0xFC,0x07,0xF0,0x1F,0xC0,0x7F')

########## Big endian constant data ##########
print("uint7 be constant data:")

arrayBe = bitstring.Array('uint7', dataBe)
print("raw data:    ", dataBe)
print("array:       ", arrayBe)
print("array bin:   ", arrayBe.data.bin)
print("array bytes: ", arrayBe.tobytes())
print()

########## Little endian constant data ##########
print("uint7 le constant data:")

# BitStrings' Array class can't accept LE byte data directly (there is no uintle7 dtype).
# So work around it by explicitly swapping the byte data.
# arrayLe = bitstring.Array('uintle7', dataLe)
arrayLe = bitstring.Array('uint8', dataLe)
arrayLe.reverse()
arrayLe.dtype = "uint7"
arrayLe.reverse()

print("raw data:    ", dataLe)
print("array:       ", arrayLe)
print("array bin:   ", arrayLe.data.bin)

# Then before reserializing it, ensure it's restored to little endian.
arrayLe.reverse()
arrayLe.dtype = "uint8"
arrayLe.reverse()
print("array bytes: ", arrayLe.tobytes())
print()

########## Big endian file ##########
print("uint7 be file data:")
with open("Image-l7-be.bin", 'rb') as f:
    inputImage = bitstring.Array('uint7')
    inputImage.fromfile(f, 16*16*7//7)
    print(inputImage)
#endwith
print()

########## Little endian file ##########
print("uint7 le file data:")
with open("Image-l7-le.bin", 'rb') as f:
    inputImage = bitstring.Array('uint8') # Set to 7-bit later.
    inputImage.fromfile(f, 16*16*7//8)
    inputImage.reverse() # Workaround for no uintle7.
    inputImage.dtype = "uint7"
    inputImage.reverse()
    print(inputImage)
#endwith
print()

Prints:

uint7 be:
raw data:     0x01fc07f01fc07f01fc07f01fc07f
array:        Array('uint7', [0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127])
array bin:    0000000111111100000001111111000000011111110000000111111100000001111111000000011111110000000111111100000001111111
array bytes:  b'\x01\xfc\x07\xf0\x1f\xc0\x7f\x01\xfc\x07\xf0\x1f\xc0\x7f'

uint7 le:
raw data:     0x803fe00ff803fe803fe00ff803fe
array:        Array('uint7', [0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127, 0, 127])
array bin:    0000000111111100000001111111000000011111110000000111111100000001111111000000011111110000000111111100000001111111
array bytes:  b'\x80?\xe0\x0f\xf8\x03\xfe\x80?\xe0\x0f\xf8\x03\xfe'

uint7 be file data:
Array('uint7', [0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 42, 42, 0, 0, 0, 42, 42, 0, 0, 42, 42, 42, 0, 0, 0, 42, 0, 0, 42, 0, 42, 0, 0, 0, 0, 42, 0, 0, 42, 0, 0, 42, 42, 42, 0, 0, 42, 0, 42, 42, 0, 42, 42, 42, 0, 0, 0, 42, 0, 0, 42, 0, 42, 0, 0, 42, 0, 42, 0, 0, 42, 0, 0, 42, 42, 42, 0, 0, 0, 42, 42, 0, 0, 42, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0])

uint7 le file data:
Array('uint7', [0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 0, 2, 5, 8, 10, 13, 16, 19, 22, 25, 27, 30, 33, 36, 39, 42, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 42, 42, 0, 0, 0, 42, 42, 0, 0, 42, 42, 42, 0, 0, 0, 42, 0, 0, 42, 0, 42, 0, 0, 0, 0, 42, 0, 0, 42, 0, 0, 42, 42, 42, 0, 0, 42, 0, 42, 42, 0, 42, 42, 42, 0, 0, 0, 42, 0, 0, 42, 0, 42, 0, 0, 42, 0, 42, 0, 0, 42, 0, 0, 42, 42, 42, 0, 0, 0, 42, 42, 0, 0, 42, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 127, 0, 127, 0, 0, 0, 0, 0, 0, 0])

Summary

chances are that whatever you want to do there's a simple and elegant way of doing it. ... most of my efforts are on bitformat right now

Well, an explicit dtype would be cleaner, but I have the means to achieve all I need via dtype hacking and 2 reverse statements. So if you choose to invest all future endianness effort into bitformat instead and close this issue, fine with me - any other people searching the issues can notice this approach too. I'll still try prototyping in a local fork... Thanks Scott.

Feb 05 '25 02:02 fdwr

Hi, sorry I haven't managed to reply to this properly yet. I do see what you're saying on the little-endian types now. I think there is confusion (probably all me) on the relationship between the little-endianness and the LSB0 mode. For example the little-endian 12 bit packing looks fine in LSB0 mode, but I don't think the reading/writing is in the expected direction.

I need to take an hour or two and write it all down to work it out. I suspect I messed up the usefulness of the LSB0 mode, and that would be what you'd need if it worked better!

Feb 19 '25 09:02 scott-griffiths

+1, this feature would be interesting to us as well. Very nice write-up @fdwr

Jul 09 '25 05:07 RexTim

bitstring.Array('uintle12')?

TLDR

Problem

Tried

Feature request

Workarounds

Related

Memory layout documentation

Examples

C++ code to print LE or BE arays

Python code

Visualizing the order

Fun facts

Test Data

Summary