crystal
crystal copied to clipboard
Arrays have a 'silent' length limit due to Int32 type on size.
arr = Array(UInt8).new(2147483647,1u8)
arr << 2_u8
arr.size #=> -2147483648
This could be a very surprising behaviour to run across. Ideally an error should be raised or the limit removed entirely (hopefully).
A similar bug exists in Indexable, trying to use each_index also doesn't work if size is greater than 2**31-1 because the index variable (i
) defaults to the Int32 type. https://github.com/crystal-lang/crystal/blob/master/src/indexable.cr#L181
Version: Crystal 0.20.5+37 [2b57a98a9] (2017-02-08) OS: Arch Linux x86_64
Removing the limit would mean that you could store an arbitrary number of elements in an Array, and still be oblivious to the implications of that. I think if you're writing code that needs to hold more than 2147483647 elements in memory:
- You should have a deep understanding of the hardware you're running on.
- You are likely to need very particular algorithms to deal with them.
Maybe raising a compilation error would be a more acceptable solution, but then again, if it means that to implement it we need to penalize the compiler's performance, I wouldn't do it. Storing more than 2147483647 elements in a normal Array is a 1 in 10000 case, it doesn't make sense to penalize the other 9999.
I've written programs where 2147483647 was an incredibly huge limit (in the context of the program) when it was written, only to have the program still be in use when that limit became a serious issue. Programs which dealt with disk-sizes, for instance. Once you do hit that limit, maybe years after writing a program, it is very helpful if the program fails in a spectacular way instead of silently doing the wrong thing.
It would be nice if there was a way to ask the compiler to add those checks wherever they were needed (which is to say "for all arithmetic which might cause an overflow"). Then programmers who were paranoid could turn on that checking, and those who needed top performance would leave it off.
I'll also note that many years ago I worked with a group who wrote a systems-programming compiler where there was an option like this. And it turned out that the performance penalty of checking for overflows (*1) was not significant, and we compiled almost everything with the checks turned on. Even in production code which was being used by thousands of users on a single machine.
(*1 - this was true for the kinds of programs we were writing. It might not be true for everyone!)
This is one of the exact problems Rust (https://www.rust-lang.org) was created to prevent. I understand Crystal uses the libc, et al, C libraries to build upon. Is this correct? Have you considered using Rust equivalent libraries?
Who, me? I have considered rust. At this point I have no interest in it. I'd also think that the main problem they want to solve is ownership of memory, which did interest me. But the more I looked into the language, the less I liked what they were doing.
There are some aspects of rust which are interesting, but for me personally there is no desire to choose rust over crystal.
:+1: for raising if an array tries to expand through Int32::MAX
elements
:-1: for increasing the max size of array. If arrays being too small ever emerges as a big problem, we can revisit this.
@jzakiya @drosehn I don't think rust has anything to do with this. Crystal can implement overflow checking, and there is a separate issue for that you can make your opinion known at.
Is there really any significant cost to increasing the max size of the array? As I understand it, it would result in the array type being 8 bytes larger (Int32 -> UInt64 for both @size
and @capacity
). Would there be any performance detriment?
@shelvacu It's not about the performance, it's about consistency and expected behaviour. We don't want to make the size
be UInt64
: unsigned integers are very unintuitive and bug-prone. Int64
could work, but the default numeric type for literals is Int32
, so for example this will stop working:
a = [] of Int32
a << 1
a << a.size # Error: you'll have to do a.size.to_i
Also, right now creating the array from the first snippet in this thread, the one with 2147483647 elements, takes 14 seconds. I don't think such big arrays are common. So making the language work for a very rare use case at the cost of making everything else worse (like the example above) is not good. If you need a big array you could make one yourself, with Pointer
you can allocate more memory than the limit of Int32
.
That said, we do need these checks in Array, Hash and other containers, instead of silently failing.
I think that, in addition to what @asterite said, once your array gets to Int32::MAX members, it's often better to fail hard than continue to eat memory. Consider an array of objects (pointers, assume 64bit), that will itself eat 16GiB of memory before it fails. In fact for that case it's way too high.
suggest:
- return nil if not defined
- add limit check
irb(main):001:0> a=[]
=> []
irb(main):002:0> a[999]
=> nil
irb(main):003:0> a[999999999]
=> nil
irb(main):004:0> a[9999999999999999]
RangeError: bignum too big to convert into `long'
return nil if not defined
Not type safe...
why we need type safe , duck typing is useful :
class Object
def add(y)
t = choose_type(self.type,y.type)
self.to_type(t) + y.to_type(t)
end
end
# Number -> String
Because type safety is useful too, and Crystal already checks for nil
, so that's not even possible...
As a note, this issue changes with Crystal 1.0.0 (and several versions before) due to arithmetic overflow checking. The length limit is no longer silent, though the error could be wrapped to be more informative.
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2