bitvec icon indicating copy to clipboard operation
bitvec copied to clipboard

How do you create a Vec<u8> or &[u8]?

Open JonathanWilbur opened this issue 3 years ago • 9 comments

I am just trying to create a big-endian array of bytes from bits. The similar crate, bit-vec has .to_be_bytes(), but I see no convenient way of converting a BitVec into bytes (without inefficiency, for-loops, etc.). To be even more concrete about my use case, I am actually just trying to write these bytes to a Write type, so even some kind of piping method would work, I think. I feel like I have to be missing something.

JonathanWilbur avatar Apr 25 '22 02:04 JonathanWilbur

I should add that this differs from #66 because I do not want to write empty trailing bytes. I only want to write bytes that have bits used in them.

JonathanWilbur avatar Apr 25 '22 12:04 JonathanWilbur

Why not use Bitvec::to_vec?

kraktus avatar May 22 '22 19:05 kraktus

@kraktus That documentation you linked to is for into_vec, not to_vec. into_vec just gives you the underlying Vec<usize> in which the bits are stored, so this is unsuitable for my purposes. to_vec is deprecated, and it just returns another BitVec.

JonathanWilbur avatar May 23 '22 00:05 JonathanWilbur

The changelog says "BitSlice::as_raw_slice is removed. Use .domain() to access the underlying memory.". But I can't figure out how to use that to get to &[u8]. If anyone else finds a way to be able to get &[u8] from BitSlice (without allocating) I would also like to know, as this prevents me from writing a zero-copy parser with nom.

XAMPPRocky avatar May 25 '22 07:05 XAMPPRocky

@kraktus That documentation you linked to is for into_vec, not to_vec. into_vec just gives you the underlying Vec<usize> in which the bits are stored, so this is unsuitable for my purposes. to_vec is deprecated, and it just returns another BitVec.

I see. the Read implementation of Bitvec seem to be what you're looking for then. The documentation does not say if it needs allocation or not though, you'll have to check the source code.

kraktus avatar May 25 '22 07:05 kraktus

Alternatively you can just set the backend type of the Bitvec to be u8, then use Bitvec::to_vec which will definitely not allocate.

kraktus avatar May 25 '22 08:05 kraktus

Alternatively you can just set the backend type of the Bitvec to be u8, then use Bitvec::to_vec which will definitely not allocate.

Yes but converting from BitSlice to BitVec will allocate, which is not what I want.

XAMPPRocky avatar May 25 '22 09:05 XAMPPRocky

Yes but converting from BitSlice to BitVec will allocate, which is not what I want.

BitSlice implements Read too, which could fit you needs, you'll need to check the code to see if it allocates.

Note that your issue is different from the OP, so it's expected you don't get answers related to your problem in their issue.

kraktus avatar May 25 '22 09:05 kraktus

a big-endian array of bytes

I don't know what this means. I am going to assume that you mean "break a multi-byte integer into bytes, then copy those bytes into a sink in order of decreasing numerical significance".

I am actually just trying to write these bytes to a Write type

BitSlice<T, {Lsb0, Msb0}> is Read for all T integers. It won't perform the decreasing-numerical-significance ordering that I think you're asking for, but you absolutely can just do io::copy(&mut &bits, &mut sink) and get consistent results. You can even do io::copy(&mut &sink, &mut bitvec![0; a_whole_lot) and get the exact same bit sequence back out (assuming matching type parameters between &bits and &mut bitvec![]).


"Use .domain() to access the underlying memory." But I can't figure out how to use that to get to &[u8].

The BitDomain::Region type has a field body: &/mut [T]. If T is u8, then you have a slice of bytes. If T is not u8 (the default is usize), then I cannot give you a slice of bytes. You can make your own by using slice::from_raw_parts(body.as_ptr().cast::<u8>(), body.len() * mem::size_of::<u8>()). But since Rust does not provide a fn raw_bytes(&[T]) -> &[u8]; function, neither may bitvec.


The documentation does not say if it needs allocation or not though, you'll have to check the source code.

bitvec does not ever unexpectedly allocate. My implementations of Read and Write do not touch the allocator, and stay strictly within the confines of existing buffers provided to them. This is in keeping with the behavior of the Read and Write implementations on [u8] and Vec<u8>.


bitvec is a lens over ordinary Rust memory buffers. Its only new functionality is:

  • the ability to address individual bits
  • the ability to select an addressing scheme of bits within containing elements
  • the natural consequences of this addressing system

In particular, it flatly refuses to produce views of memory that may violate the permission model of the Rust language. While it so happens that the Lsb0 ordering on little-endian hardware can survive integer-width transmutation just fine (see here),

If you want to get actual u8s out of bitvec directly, without using indirect walkers such as the Read trait, then you'll need to start with u8s before bitvec is even involved. The default type parameters are <usize, Lsb0>; chances are, if you are looking at the raw underlying memory, you want <u8, Msb0>.

Example code:

use bitvec::prelude::*;
use std::{mem, slice};

let data = bits![u8, Msb0; 1; 100];
let (_, body, tail) = data.domain().region().unwrap();
// body is &[u8]
// tail is Some(PartialElement(&u8))

let data = bits![usize, Lsb0; 1; 100];
let (_, body, tail) = data.domain().region().unwrap();
// body is &[usize]
// tail is Some(PartialElement(&usize))

let body = unsafe { 
  slice::from_raw_parts(
    body.as_ptr().cast::<u8>(), 
    body.len() * mem::size_of::<usize>(),
  )
};
// you cannot convert a PartialElement directly
let tail_elem = tail.load_value();
let tail_mask = tail.mask();
// but you can get whichever of its bytes contain live bits
let tail_bytes = tail_elem.to_be_bytes();
let tail_mask = tail_mask.into_inner().to_be_bytes();
let live_bytes = tail_bytes.into_iter()
  .zip(tail_mask.into_iter())
  .filter(|(b, m)| m != 0);
// or you can ignore the mask. all dead bits in the PartialElement are zeroed on load.

I hope these satisfactorily answer your questions?

myrrlyn avatar Jul 04 '22 23:07 myrrlyn