rfcs
rfcs copied to clipboard
RFC: Subslice-offset - Get the offset of references into a slice.
(This idea was mentioned in the 'Future possibilities' section of https://github.com/rust-lang/rfcs/pull/2791)
I feel these would be useful. I've implemented the following less-generic analog before and was surprised it didn't exist.
fn offset_within(parent: &str, child: &str) -> Option<usize> { ... }
There’s also UB in the proposed implementation, if it is called with slices like this (playground).
Note: The UB comes from (quoting the docs)
If any of the following conditions are violated, the result is Undefined Behavior:
Both the starting and other pointer must be either in bounds or one byte past the end of the same allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.
Both pointers must be derived from a pointer to the same object. (See below for an example.)
The distance between the pointers, in bytes, cannot overflow an isize.
The distance between the pointers, in bytes, must be an exact multiple of the size of T.
The distance being in bounds cannot rely on "wrapping around" the address space.
Emphasis mine. The first point fundamentally can't be checked, the second point could be checked, but isn't right now.
This UB can be alleviated by using wrapping_offset_from instead, if that existed! (I'm surprised it doesn't exist). You'll have to convert the pointers to integers and do the arithmetic yourself to be safe.
Below is another proposed implementation based on the feedback here. The main changes are explicit handling of ZSTs, and "manually" doing arithmetic on pointer addresses instead of using <*const T>::offset_of to avoid its potential UB.
In order to handle the case mentioned https://github.com/rust-lang/rfcs/pull/2796#issuecomment-772914452 (reproduced in code comments) we need to compute the distance in bytes and check its modulo. At that point we’ve already done half of the job of offset_of so we might as well do the division on that integer.
To be added to the doc-comment of index_of:
Zero-sized types
When
size_of::<T>() == 0the memory address of&Tis not meaningful, soindex_ofcan only returnNoneorSome(0). It may spuriously returnSomewith anelementcreated from outside theselfslice.
To be added to the doc-comment of range_of:
If a range is returned, its length is always
subslice.len().Zero-sized types
When
size_of::<T>() == 0the memory address of&Tor&[T]is not meaningful, sorange_ofcan only returnNoneorSome(0..subslice.len()).Zero-sized subslice
When
size_of::<T>() == 0orsubslice.is_empty(),range_ofmay spuriously returnSomefor asubslicecreated from outside theselfslice.
Proposed implementation
use std::mem;
use std::ops::Range;
trait SliceExt<T> {
fn index_of(&self, element: &T) -> Option<usize>;
fn range_of(&self, subslice: &[T]) -> Option<Range<usize>>;
}
impl<T> SliceExt<T> for [T] {
fn index_of(&self, element: &T) -> Option<usize> {
let size_of = mem::size_of::<T>();
if size_of > 0 {
let range = as_address_range(self);
let element_address = element as *const T as usize;
if range.contains(&element_address) {
let bytes_offset = element_address - range.start;
if bytes_offset % size_of == 0 {
Some(bytes_offset / size_of)
} else {
// This could happen with arguments like this:
//
// ```
// use bytemuck::cast_slice;
// let x: &[[i32; 2]] = &[[1,2],[3,4]];
// let y: &[[i32; 2]] = cast_slice(&cast_slice::<_, i32>(x)[1..3]);
// let index = x.index_of(&y[0]);
// dbg!(x, y, index);
// ```
//
// … which prints:
//
// ```
// x: [[1,2],[3,4]]
// y: [[2,3]]
// index: None
// ```
None
}
} else {
None
}
} else {
// All items of a slice of ZST share the same address
if element as *const T == self.as_ptr() {
Some(0)
} else {
None
}
}
}
fn range_of(&self, subslice: &[T]) -> Option<Range<usize>> {
let size_of = mem::size_of::<T>();
if size_of > 0 {
let range = as_address_range(self);
let subrange = as_address_range(subslice);
if subrange.start >= range.start && subrange.end <= range.end {
let bytes_offset = subrange.start - range.start;
if bytes_offset % size_of == 0 {
let start = bytes_offset / size_of;
let end = start + subslice.len();
Some(start..end)
} else {
// This could happen with arguments like this:
//
// ```
// use bytemuck::cast_slice;
// let x: &[[i32; 2]] = &[[1,2],[3,4]];
// let y: &[[i32; 2]] = cast_slice(&cast_slice::<_, i32>(x)[1..3]);
// let range = x.range_of(y);
// dbg!(x, y, range);
// ```
//
// … which prints:
//
// ```
// x: [[1,2],[3,4]]
// y: [[2,3]]
// range: None
// ```
None
}
} else {
None
}
} else {
// All items of a slice of ZST share the same address
if subslice.as_ptr() == self.as_ptr() {
Some(0..subslice.len())
} else {
None
}
}
}
}
fn as_address_range<T>(slice: &[T]) -> Range<usize> {
let ptr_range = slice.as_ptr_range();
let start = ptr_range.start as usize;
let end = ptr_range.end as usize;
start..end
}
I’ve edited my proposed doc-comment additions to cover ZSTs and empty subslices.
I was going to propose index_of myself, so I'm happy to see this is already proposed and being worked on.
I think this RFC is a good idea, but the specific choice of name index_of() concerns me, because it sounds too much like an innocuous value-comparison operation, not one which rightfully has the warning “Note that this does not look at the value, but only at the reference” in its documentation. Generally Rust keeps pointer-equality in items with ptr in the name, like Arc::ptr_eq(), and everything else is by value.
Unfortunately, these particular methods don't take pointers but putting ptr in the name would sound like they do. Maybe index_of_ref() and range_of_ref() would be adequate to flag “something unusual is here”? Or more verbosely, index_of_contained_ref()?
Unfortuantely, I won't have the time to update this RFC any time soon. If someone wants to push this feature forward without waiting for me to update it, please feel free to open a new RFC taking whatever you need from this one. :)
This came up in a recent discussion.
Potential alternative: a function on a slice that takes another slice and computes the offset from one to the other. So, for instance, a function on a &str that accepts another &str and returns a usize. This would be useful, for instance, after splitting a string, to get the offset of the split-out string in the original string.