reverse method on Range of Chars gives unexpected results
reproduction steps
using Scala 3,
scala> (1 to 26).reverse
val res0: Range = Range 26 to 1 by -1
scala> ('a' to 'z').reverse
val res1: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by
problem
Seems like all other Ranges give you a collection back of the same size when you reverse them, but a Range of Chars returns an empty Range. I understand you can get around this by turning it into a Seq (with memory usage implications) or using reverseIterator, but the above behaviour was surprising and caused a bug in my code.
Similarly it would be nice if you could make a reversed range with ('z' to 'a' by -1) or something to that effect. (Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... :shrug:)
this is the Scala 2 bug tracker, but the same behavior does occur in Scala 2
@scala/collections ?
Just to add: toSeq doesn't actually fix it; you have to use toList etc
scala> ('a' to 'z').toSeq.reverse
val res2: scala.collection.immutable.NumericRange[Char] = empty NumericRange z to a by
scala> ('a' to 'z').toList.reverse
val res3: List[Char] = List(z, y, x, w, v, u, t, s, r, q, p, o, n, m, l, k, j, i, h, g, f, e, d, c, b, a)
note that 2.12.15 doesn't have this problem
I am torn, because on the one hand, Chars are not numbers (I don't care what they're backed by), and on the other hand, it's quite convenient to be able to have a range of them.
anyway, I believe the problem is that Chars are, to the extent that they're numbers at all, unsigned. thus,
scala> ('a' to 'z').step.toInt
val res4: Int = 1
scala> ('a' to 'z').reverse.step.toInt
val res5: Int = 65535
Interestingly, this won't compile with a negative number, saying it needs to be a Char, but does with a positive number... :shrug:
see above about being unsigned
@NthPortal char is a numeric value type https://scala-lang.org/files/archive/spec/2.13/12-the-scala-standard-library.html#numeric-value-types
just replying to whether it's a number. Signedness is a nice catch.
this is actually a fundamental problem with NumericRange; it doesn't work for unsigned numeric types. one could theoretically write their own u32 type, and it would not be possible to create a reverse range for that type (which is very much numeric) either. I believe it would require a fundamental change such as having a boolean to track if the range is reversed.
we might be able to fix the problem for Char to Char by adding a custom (and private) subtype of NumericRange specifically for Char, and have .reverse do some magic. I don't know if that would be feasible while maintaining bincompat, but my instinct says it is.
char is a numeric value type
Char is numeric in that it represents a UTF-16 code point (which is a number), but numeric operations don't actually make any sense at all for Char. 'a' + 'z' is nonsense. 'a' * 'z' is nonsense. 'a' to 'z' by '<START OF HEADING>' is nonsense, but is what 'a' to 'z' actually means (but you'd never know if I swapped '<START OF HEADING>' for the wrong character). 'a' to 'z' happens to get you the result you want, but adding code points is a nonsensical operation. fundamentally, I think supporting 'a' to 'z' is wrong; it works for that alphabetical range, but anything outside the English alphabet (or even mixing uppercase and lowercase) is meaningless and arbitrary. For example, what does '0' to 'Q' mean?
if you want a range of codepoints, (Int to Int).reverse.view.map(_.toChar) will be clearer, and is also incidentally a workaround to the problem (e.g. ('a'.toInt to 'z'.toInt).reverse.view.map(_.toChar))
note that 2.12.15 doesn't have this problem
to clarify, in 2.12 ('a' to 'z').reverse returns a Vector. I've not yet gone digging to find out why.
Edit: in 2.13, NumericRange#reverse returns a NumericRange, so we cannot have it do what 2.12 does by returning a Vector regardless.
I kind of want to override .reverse on NumericRange[Char] to throw an exception with the message "Sadness. You have chosen to treat a character as a number, and now look what has happened."
Just from a user's POV, I'm not aware - and I see no suggestion in the Scaladoc - that a Range of Chars being treated as a NumericRange behind the scenes... conceptually I want it to be treated as a hypothetical AlphabeticRange. So perhaps the exception should be a bit more apologetic - "Dang, we redesigned the collections and thought we could treat Chars like any other number, but sadly, it seems we can't. BOOM. Sorry. Please revert to Scala 2.12."
Range, as I see it, would ideally be a type of lightweight collection that will work with any type that offers an implementation - wouldn't it be nice if we could have a Range of Foos that works as you might expect it to, e.g. Foo(10) to Foo(20) by 2, or even Earth to Neptune assuming these are part of an ordered enum of planets. It would have a type parameter as all other collection types do, and the current concept would just apply to the special case of Range[Int]. But then, this is somewhat at odds with how Range is described, as "The Range class represents integer values in range [start;end) with non-zero step value step." - so perhaps we'd need to invent a new class with a different name.
Range of Double also did not succeed. I don't think this conceptual issue due to collections, but just about the step.
I agree that range of chars is natural, in the sense that toLower is 'a' + c - 'A' . But as mentioned, code points are ints.
I remember Odersky saying NumericRange is an over-abstraction, but as noted, anything countable could have a range. Maybe Scala 3 enables syntax for GroundControl to MajorTom etc. Someone recently asked somewhere for .. syntax. Oh, here is the enum range: https://github.com/lampepfl/dotty-feature-requests/issues/145
(I'm repeating after @som-snytt here, but I think it bears repeating:)
Char being a numeric type (a slightly odd one, but nonetheless solidly a numeric type) is a C and Java tradition carried on in Scala. It is arguably an unfortunate tradition, and it's definitely surprising when first encountered, but for a Range of Chars to be a NumericRange is the product of decades of tradition.
references:
- Scala 2 specification: https://scala-lang.org/files/archive/spec/2.13/12-the-scala-standard-library.html#numeric-value-types
- the Java spec also considers
charan "integral type", though unsigned: https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2 - the JVM spec concurs: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.3
- as for C, surely a more authoritative reference exists, but a little quick googling brought this up: https://stackoverflow.com/questions/11698535/is-char-a-special-type-of-integer-variable
this leads to working-as-specified behaviors such as:
scala> 'a' + 1
val res1: Int = 98
scala> ('a' + 1).toChar
val res2: Char = b
scala> 'a' + 'b'
val res3: Int = 195
scala> 'a' * 2
val res4: Int = 194
scala> 'a' * 'b'
val res5: Int = 9506
and any argument for proposed changes to the behavior Ranges of Charss needs to be considered in this context
Progression of a sort on Scala 3 with re-engineered enrichments:
scala> ('a' to 'z').reverse
[[syntax trees at end of typer]] // rs$line$2
package <empty> {
final lazy module val rs$line$2: rs$line$2 = new rs$line$2()
final module class rs$line$2() extends Object() { this: rs$line$2.type =>
val res0: scala.collection.immutable.NumericRange[Char] =
charWrapper('a').to('z').reverse
}
}
java.lang.ArithmeticException: number type is unsigned, and .reverse requires a negative step
at scala.collection.immutable.NumericRange.reverse(NumericRange.scala:263)
... 30 elided