[RFC] Rename `StaticArray#to_slice` to `StaticArray#to_unsafe_slice`
Take a look at this (found through gitter):
require "openssl/sha1"
def f(a)
OpenSSL::SHA1.hash(a).to_slice
end
b = f("some string")
puts b # => Bytes[20, 0, 0, 0, 132, 52, 159, 225, 254, 127, 0, 0, 209, 98, 5, 245, 103, 230, 42, 0]
rand # any code here
puts b # => Bytes[20, 0, 0, 0, 132, 52, 159, 225, 254, 127, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0]
The problem is that StaticArray#to_slice is implemented like this:
Slice.new(to_unsafe, size)
So that's creating a slice from the static array's pointer. If that escapes the stack, oops.
I propose we rename it to to_unsafe_slice. Then it's clear that you are doing something unsafe and you should read the docs, and we should include an example where using to_unsafe_slice is problematic.
We could then make StaticArray#to_slice allocate a new slice so it's okay if it escapes the stack.
From docs,
"A slice is a Pointer with an associated size"
I think the problem is more in the wrong return of the static pointer in the function than with the naming of to_slice. I think the use of slice should be categorized in the same frame as Pointer, maybe delete from docs the word "safe" and just describe the advantages over Pointer. Maybe even deleted? if need it, you can create the Slice from the pointer and it is more obvious. Also Array doesnot have to_slice method
If StaticArray#to_slice allocates a new slice maybe just use to_a ?
I think the use of slice should be categorized in the same frame as Pointer
No, Slice is totally safe. The problem is returning a slice that points to data in the stack. And that's a problem in StaticArray#to_slice. No other to_slice method has this problem, they always return data that points to the heap.
I like this. Safe by default. Could there be a compiler warning for a version or two when using the old #to_slice, so that code doesn't silently start taking a performance hit where it might not be needed? Just having each case of StaticArray#to_slice highlighted during the transition would be super helpful.
@dscottboggs Na, I don't think that's worth it. Performance is important but we don't need a regression path for that.
IO::Memory#to_slice should also be considered unsafe because it points to the buffer and the buffer may be relocated.
While working on an isolated, up-to-date refactor for StaticArray in #14613, I've come to understand that this problem needs a different scope.
I had briefly mentioned this before in https://github.com/crystal-lang/crystal/pull/11031#issuecomment-915110653:
I think there should be a clear distinction between a method that returns a slice that remains attached to the original data like
String#to_sliceand one that's only a temporary snapshot that may not be valid anymore after mutating the original data or leaving scope.Perhaps we can find a different name than
#to_unsafe_slice?
@ysbaddaden came up with a great insight. Most implementations of #to_slice are not actually turning the value into a slice. They're just providing a view of the original value in the form of a Slice (this is a much better description for what I meant with "snapshot"). That's somewhat similar to #unsafe_as(Slice). The only difference is that the reinterpretation does not involve the entire type, but is usually based on an instance variable.
This suggest #unsafe_as_slice (or #as_unsafe_slice?) as name for methods that return a different representation of a value as a slice.
-
#to_*means a transformation into the target type -
#as_*means a different representation of the original value (compare toJSON::Any#as_*for prior work)
#to_slice should always be safe. It can be implemented as #unsafe_as_slice.dup, meening memory needs to be copied. This is of course inefficient an unnecessary for many use cases. That's where using #unsafe_as_slice directly can provide a more performant alternative, under the condition that the representation stays valid.
Implementations based on immutable data (such as String#to_slice) don't need to copy memory, though. So maybe that should be called String#as_slice?
Another relevant property may be if the resulting Slice is synchronized with the original value, i.e. changes to the original value reflect to the slice and vice versa. For immutable data, this is of courseirrelevant (the slice is read-only).
Alternative representations always refer to the same underlying data, so they are naturally synchronized. Other ransformations would usually be unsychronized becase they need to copy the data. The only exception is the idempotent Slice#to_slice (which actually seems a bit odd if you think about it this way...).
I'd prefer #as_slice over #as_unsafe_slice but I guess the latter is more explicit and consistent with #to_unsafe that returns the underlying pointer: the operation itself is safe (unlike #unsafe_as that casts to another type without any validation) but the value may become unsafe when the original object is mutated.
That being said, introducing #as_ is a mere extension of the public API, but changing the behavior of #to_slice to return a copy is a breaking change.
Taking a slice representation of Array or StaticArray is always unsafe, so unsafe should be part of the method name.
String#as_slice is fine though because it's entirely safe (unless the string itself is used in an unsafe manner, of course).
Following up on that, I'd suggest the following changes:
- Rename
StaticArray#to_sliceto#as_unsafe_slice - Add
Array#as_unsafe_slice - Add
String#as_slice - Rename
BitArray#to_sliceto#as_slice
BitArray#as_slice is debatable. It could also be #as_unsafe_slice. There's no realloc, so this should be technically safe. But the resulting slice is writable and flipping any of the unused bits is undefined behaviour.
Maybe for simplicity reasons, it would be better to avoid #as_slice and always call it #as_unsafe_slice, even though in the case of String it should be entirely safe. I don't expect there to be many more types with similar characteristics.
Any type that implements #as_slice or #as_unsafe_slice can implement #to_slice as as_slice.dup or as_unsafe_slice.dup, respectively.
An alternative is to have non-reallocatable objects (String, BitArray) to have #as_slice creating the slice read-only (this is what String#to_slice does), and #as_unsafe_slice to create it r/w (although I'm not sure how useful this method can be).
Any type that implements
#as_sliceor#as_unsafe_slicecan implement#to_sliceasas_slice.duporas_unsafe_slice.dup, respectively.
I don't think there is ever a need for a generic method that does as_slice.dup or as_unsafe_slice.dup, because the whole purpose of returning a Slice is precisely to avoid this duplication. Otherwise the appropriate type ought to be Array.
and
#as_unsafe_sliceto create it r/w (although I'm not sure how useful this method can be).
We should never expose that on String because it is supposed to be immutable.
If you really, really want to mess with a String instance, you can force your way to a writable Slice. But the API should not support that.
Yeah, that's why I mentioned:
although I'm not sure how useful this method can be
My main point was to consider it safe by making it safe (in particular, the BitArray one)
I believe String#to_slice already returns a read-only slice, so #as_slice returning a read-only slice for String and BitArray is :+1: