crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Arrays have a 'silent' length limit due to Int32 type on size.

Open shelvacu opened this issue 8 years ago • 15 comments

arr = Array(UInt8).new(2147483647,1u8)
arr << 2_u8
arr.size #=> -2147483648

This could be a very surprising behaviour to run across. Ideally an error should be raised or the limit removed entirely (hopefully).

A similar bug exists in Indexable, trying to use each_index also doesn't work if size is greater than 2**31-1 because the index variable (i) defaults to the Int32 type. https://github.com/crystal-lang/crystal/blob/master/src/indexable.cr#L181

Version: Crystal 0.20.5+37 [2b57a98a9] (2017-02-08) OS: Arch Linux x86_64

shelvacu avatar Feb 08 '17 13:02 shelvacu

Removing the limit would mean that you could store an arbitrary number of elements in an Array, and still be oblivious to the implications of that. I think if you're writing code that needs to hold more than 2147483647 elements in memory:

  1. You should have a deep understanding of the hardware you're running on.
  2. You are likely to need very particular algorithms to deal with them.

Maybe raising a compilation error would be a more acceptable solution, but then again, if it means that to implement it we need to penalize the compiler's performance, I wouldn't do it. Storing more than 2147483647 elements in a normal Array is a 1 in 10000 case, it doesn't make sense to penalize the other 9999.

mverzilli avatar Feb 08 '17 15:02 mverzilli

I've written programs where 2147483647 was an incredibly huge limit (in the context of the program) when it was written, only to have the program still be in use when that limit became a serious issue. Programs which dealt with disk-sizes, for instance. Once you do hit that limit, maybe years after writing a program, it is very helpful if the program fails in a spectacular way instead of silently doing the wrong thing.

It would be nice if there was a way to ask the compiler to add those checks wherever they were needed (which is to say "for all arithmetic which might cause an overflow"). Then programmers who were paranoid could turn on that checking, and those who needed top performance would leave it off.

I'll also note that many years ago I worked with a group who wrote a systems-programming compiler where there was an option like this. And it turned out that the performance penalty of checking for overflows (*1) was not significant, and we compiled almost everything with the checks turned on. Even in production code which was being used by thousands of users on a single machine.

(*1 - this was true for the kinds of programs we were writing. It might not be true for everyone!)

drosehn avatar Feb 08 '17 18:02 drosehn

This is one of the exact problems Rust (https://www.rust-lang.org) was created to prevent. I understand Crystal uses the libc, et al, C libraries to build upon. Is this correct? Have you considered using Rust equivalent libraries?

jzakiya avatar Feb 08 '17 18:02 jzakiya

Who, me? I have considered rust. At this point I have no interest in it. I'd also think that the main problem they want to solve is ownership of memory, which did interest me. But the more I looked into the language, the less I liked what they were doing.

There are some aspects of rust which are interesting, but for me personally there is no desire to choose rust over crystal.

drosehn avatar Feb 08 '17 19:02 drosehn

:+1: for raising if an array tries to expand through Int32::MAX elements

:-1: for increasing the max size of array. If arrays being too small ever emerges as a big problem, we can revisit this.

RX14 avatar Feb 08 '17 19:02 RX14

@jzakiya @drosehn I don't think rust has anything to do with this. Crystal can implement overflow checking, and there is a separate issue for that you can make your opinion known at.

RX14 avatar Feb 08 '17 19:02 RX14

Is there really any significant cost to increasing the max size of the array? As I understand it, it would result in the array type being 8 bytes larger (Int32 -> UInt64 for both @size and @capacity). Would there be any performance detriment?

shelvacu avatar Feb 09 '17 01:02 shelvacu

@shelvacu It's not about the performance, it's about consistency and expected behaviour. We don't want to make the size be UInt64: unsigned integers are very unintuitive and bug-prone. Int64 could work, but the default numeric type for literals is Int32, so for example this will stop working:

a = [] of Int32
a << 1
a << a.size # Error: you'll have to do a.size.to_i

Also, right now creating the array from the first snippet in this thread, the one with 2147483647 elements, takes 14 seconds. I don't think such big arrays are common. So making the language work for a very rare use case at the cost of making everything else worse (like the example above) is not good. If you need a big array you could make one yourself, with Pointer you can allocate more memory than the limit of Int32.

That said, we do need these checks in Array, Hash and other containers, instead of silently failing.

asterite avatar Feb 09 '17 10:02 asterite

I think that, in addition to what @asterite said, once your array gets to Int32::MAX members, it's often better to fail hard than continue to eat memory. Consider an array of objects (pointers, assume 64bit), that will itself eat 16GiB of memory before it fails. In fact for that case it's way too high.

RX14 avatar Feb 09 '17 15:02 RX14

suggest:

  • return nil if not defined
  • add limit check

irb(main):001:0> a=[]
=> []
irb(main):002:0> a[999]
=> nil
irb(main):003:0> a[999999999]
=> nil
irb(main):004:0> a[9999999999999999]
RangeError: bignum too big to convert into `long' 


sevk avatar Mar 09 '17 07:03 sevk

return nil if not defined

Not type safe...

refi64 avatar Mar 09 '17 16:03 refi64

why we need type safe , duck typing is useful :

class Object
  def add(y)
    t = choose_type(self.type,y.type)
    self.to_type(t) + y.to_type(t)
  end
end
# Number -> String

sevk avatar Mar 10 '17 05:03 sevk

Because type safety is useful too, and Crystal already checks for nil, so that's not even possible...

refi64 avatar Mar 10 '17 15:03 refi64

As a note, this issue changes with Crystal 1.0.0 (and several versions before) due to arithmetic overflow checking. The length limit is no longer silent, though the error could be wrapped to be more informative.

RespiteSage avatar Apr 21 '21 14:04 RespiteSage

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2

crysbot avatar Feb 18 '24 16:02 crysbot