FSharpPlus
FSharpPlus copied to clipboard
Int32.TryParse and F#+ tryParse give different results
Description
Basically (tested with culture on en-US).
> let value: int option = tryParse "123,45";;
val value: int option = Some 12345
> Int32.TryParse "123,45";;
val it: bool * int = (false, 0)
In other words, integers cannot have thousand-separators with Int32.TryParse, but can with tryParse.
Conversely, it doesn't allow decimals, which is good:
> let value: int option = tryParse "123.45";;
val value: int option = None
> Int32.TryParse "123.45";;
val it: bool * int = (false, 0)
Likewise there's a similar difference with e-notation:
> let value: int option = tryParse "12345e4";;
val value: int option = Some 123450000
> Int32.TryParse "12345e4";;
val it: bool * int = (false, 0)
Repro steps
See above.
Expected behavior
From the description of the method, you'd expect the same behavior as the .NET functions, but they allow certain illegal values, or at least ambiguous, considering how .NET parses numbers.
Actual behavior
See above. This is caused by NumberStyles.Any, which is not used by Int32.TryParse or Decimal.TryParse. These use NumberStyles.Integer and NumberStyles.Decimal respectively (see source of .NET), which limit the allowed range.
Known workarounds
Create your own types or call Int32.TryParse directly, or create your own helper function.
Related information
Most recent F#+ and tested with .NET 6, but assuming it is the same on .NET 7.
Related: regardless of whether we'd fix this issue or not, perhaps we can update the documentation to specify that parse and tryParse uses InvariantCulture. Currently, it not being mentioned, may lead people to believe it works the same as XXX.Parse/TryParse from the BCL, that is, that it changes behavior depending on the current culture.
Yes, I think it makes sense to have it documented as using invariant culture.
As always with parsers there's the question of how smart my parser should be, where do you draw the line?
Our current parsers uses NumberStyles.Any which makes them very flexible, you could argue they should be smarter and make them able to parse "thirtyseven" as 37, or make them very strict and expect "clean" numbers, without any , or . separator, or redundant + signs.
The problem is, parsers are kind of the inverse of printers, but it's not bijective as there are many ways to print and many ways to parse.
We can only implement one way to print, however making parsers so strict that they only can parse that specific printing way could be a nice property but a pain to work with in real world scenarios.
So we can come up with this property: print >> parse = id (and we could add FsChecks for it) which doesn't imply parse >> print = id.
How different styles we allow? That's an interesting discussion (thanks for kicking it off), I think we should think in practical cases.
Then as you say we have the compatibility problem, if we eventually decide to make them more strict, but we can always implement a parseStrict something like ParseExact in order to satisfy cases where the developer is more interested in validating than getting results as far as practical.
Our current parsers uses NumberStyles.Any
While I agree with everything you say, the problem with Any is that invalid numbers are parsed, which I don't think is part of what most people consider valid input.
Since this is about calling the parse methods of .NET, I think, apart from forcing Invariant Culture, that we should stay as close as possible to what .NET itself implements, to stick with the "principle of least suprise".
Certainly, creating invalid parsing functions will not have been the goal here. But changing this leads to a backwards compat issue. I'd vote for your proposal with parseStrict.
Note that this is not the same as ParseExact, as the standard Parse functions on number types are quite strict already. The only big problem with these is the absence of invariant culture in the default behavior.
OK, let's go with parseStrict, then the question remains for V2, I mean we can do breaking changes then but still which should be the default ? We have some time to think about it.
I would set the defaults to what I wrote before:
we should stay as close as possible to what .NET itself implements, to stick with the "principle of least suprise".
not sure if parseStrict is the ideal choice, but I can't think of anything better right now.