swift-numerics
swift-numerics copied to clipboard
[Sketch]Arbitrary-precision BigInt
Fixes #5
This is the step 2 "Sketch" PR. It's actually very complete, but obviously perhaps not ideal in style and documentation (since there is none of the latter). Probably needs more tests too, though there are some.
The basic architecture is based on Knuth's algorithms for this purpose. An integer is broken up into "digits" where each digit is in base 64 (or base 32 on 32-bit systems, which are supported). This maps nicely onto the BinaryInteger protocol since we can simply use an array of Uint for the words. We're also using 2's complement internally, which is super convenient because then for any pair of BigInts that are each only one word long, we can simply delegate to the regular operators for a minimal overhead.
Ok, new commit to address the floating-point feedback. There doesn't seem to be any easy way to test the one with the T.greatestFiniteMagnitude absent a Float16 type in the stdlib.
@rothomp3 Friendly reminder that you can "bulk" manage "access control" using extensions. Extension also give more clear groupings (especially when used with // MARK (which renders nicely in Xcode 11)).
I recommend declaring BigInt in one scope with just the stored properties (and if applicable any "designated" initializer. I write "designated" since structs does not have that formally, but sometimes conceptually).
So:
public extension BigInt {
func foo() {}
func bar() {}
}
Which you can at any point change to internal just by changing one statement, as opposed to what you currently have:
extension BigInt {
public func foo() {}
public func bar() {}
}
@rothomp3 I have not yet looked through your PR in detail, I'm sure it is great! But I just wanted to give you two references - which you surely know about, but if not they might be useful:
@rothomp3 I have not yet looked through your PR in detail, I'm sure it is great! But I just wanted to give you two references - which you surely know about, but if not they might be useful:
So both of those use additional storage for the sign, and all the extra math required for that, instead of using 2's complement to store the BigInts the same way the language/CPU already store signed integers of 64 bits or less. I believe my way more effectively leverages the hardware.
I believe my way more effectively leverages the hardware.
Do we have benchmarks that demonstrate this? (The BigInt implementations I’ve seen use sign-magnitude, and I don’t think it’s principally for ease of implementation.)
I believe my way more effectively leverages the hardware.
Do we have benchmarks that demonstrate this?
It makes (approximately) no difference. This is pretty easy to see even without benchmarks.
First off, note that converting from one to the other is O(n), and every arithmetic and bitwise operation defined on FixedWidthInteger is at least O(n), so there can be no difference in asymptotic performance--you could always just convert, do the operation, and convert back.
So we're only interested in the the constants.
- bitwise operations are exactly the same either way.
- addition and subtraction are slightly easier in the 2s complement representation.
- multiplication requires a tiny bit more bookkeeping (I really mean a tiny bit--the difference is essentially immaterial, a couple additional operations on Int / UInt for schoolbook multiplication because you need to treat the high-order word differently from the rest, and essentially no additional operations for fast multiplication algorithms).
- division is just multiplication and subtraction, so doesn't matter.
This approach is fine, and most closely matches the semantics of FixedWidthInteger, so it's the easiest to use in some ways.
@rothomp3, sorry, I've been busy this week, but I'll give this a more thorough review next week and once we're to a good place I'll set up a branch to merge it onto for further development.
@rothomp3 Friendly reminder that you can "bulk" manage "access control" using extensions.
We deliberately don't do this in the standard library, and I'm going to follow that style with Swift Numerics to facilitate moving code into the standard library if evolution wants some of these features in the future. I know that people have strong opinions about it, but ultimately it doesn't matter very much. Please keep explicit access control for now for simplicity.
I was looking at the code for the BigInt failable initializer that takes a String and I had a couple of thoughts:
- We should allow for the presence of a prefix
+operator - We can do remove redundant operations by dropping leading zeros
I think the initializer should be modified to something along the lines of the following:
public init?(_ description: String) {
guard let firstCharacter = description.first else { return nil }
var description = description
var isNegative = false
if firstCharacter == "-" {
guard description.count > 2 else { return nil }
isNegative = true
description.removeFirst()
} else if firstCharacter == "+" {
guard description.count > 2 else { return nil }
description.removeFirst()
}
let validDigits = Set("0123456789")
guard description.allSatisfy({ validDigits.contains($0) }) else { return nil }
var result: BigInt = 0
for (i, char) in description.drop(while: { $0 == "0" }).reversed().enumerated() {
result += BigInt(char.wholeNumberValue!) * pow(10, BigInt(i))
}
if isNegative {
result = -result
}
words = result.words
}
Let me just say I think it's hilarious how much scrutiny the LosslessStringConvertible initializer has gotten, all because it happens to be the first method in the file ;)
Hi @rothomp3, I created a "biginteger" branch for this work, since there's more development to be done than fits neatly in a single PR. Can you update this PR to point to that branch (I can do this for you if you need help, but it will need to wait a couple days).
Retargeted PR to biginteger branch!
Please could the APIs be grouped and sorted? (example)
Or by using an extension for each protocol? (example)
@benrimmington It's unfortunate that it apparently still requires an approval to be merged into the branch created just for this purpose haha. If this PR were able to be merged there, then you could create a new PR for the reorganization. I'll see about doing it myself, though.
I've been on vacation for the last two weeks or so, but I'm back now!
you could create a new PR for the reorganization
There's no issue with doing this now by creating a PR against your clone.
That's true, I feel silly for not having thought of that myself.
Please see rothomp3/swift-numerics#1
Thanks to @benrimmington, hopefully this is easier to read now :)
Please see rothomp3/swift-numerics#2
@stephentyrone Will there be an issue with using the same name for a type and its module?
Should we add a prefix/suffix to the product and target names? For example:
import NMXBigInt // struct BigInt
import NMXComplex // struct Complex
import NMXReal // protocol Real
UPDATE: apple/swift-numerics#94
OK, @benrimmington I fixed the failing tests, which had identified a genuine problem! Now we just need to find an implementation of an arbitrarily large BinaryFloatingPoint to really test it haha. Theoretically, that would be eligible to be a completely separate addition to this very project, wouldn't it?
Ok, division now officially works. It probably could be optimized more.
For example, I could not construct a circumstance where qhat was ever larger than Uint.max + 1. My guess is this is because of the normalization, and in fact qhat will always fit in a single word. If this is true, then _findQhat could be simplified quite a bit!
We definitely could do something like that, I’d need to try this code and make sure it actually catches all the cases correctly (looks fine at first glance).
On Mar 28, 2020, at 6:18 AM, David Waite [email protected] wrote:
@dwaite commented on this pull request.
In Sources/BigInt/BigInt.swift https://github.com/apple/swift-numerics/pull/84#discussion_r399645486:
let lhsWord = lhs.words[0]
let rhsWord = rhs.words[0]let (result, isOverflow) = lhsWord.addingReportingOverflow(rhsWord)if !isOverflow && result < Int.max {lhs.words[0] = resultreturn}let knownNegativeResult = lhsWord > Int.max && rhsWord > Int.maxif lhsWord > Int.max || rhsWord > Int.max, !knownNegativeResult {// positive + negative is always smaller, so overflow is a red herringlhs.words[0] = resultreturnHave you evaluated using pattern matches to clarify some of this code and make it a bit easier to spot any edge cases. For instance, here I believe you could do:
if lhs.words.count == 1, rhs.words.count == 1 { let lhsWord = lhs.words[0] let rhsWord = rhs.words[0] let lhsNegative = lhsWord > Int.max let rhsNegative = rhsWord > Int.max let sum = lhsWord &+ rhsWord let sumNegative = sum > Int.max switch (lhsNegative, rhsNegative, sumNegative) { case (true, true, false); lhs.words = [sum, UInt.max] return case (false, false, true); lhs.words = [sum, 0] return default: lhs.words[0] = sum return } }— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apple/swift-numerics/pull/84#pullrequestreview-383300829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6NZ2R5V7N54VYEIXZV32LRJXFGJANCNFSM4J37T7WA.
Hey @rothomp3, are you still working on this issue? I've been exploring a design where we keep the lowest word inline. It seems promising and I'd like to work on it more, but don't want to step on any ongoing plans you have.
Hi all, I've shared an alternative implementation at #120 which starts from a different approach.
Sorry I missed this message at the time, I am happy to keep working on it, but hadn’t heard anything in quite a while from the people responsible for merging it or telling me why they can’t ;) Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!
Robert Thompson Platform Software Engineer WillowTree https://willowtreeapps.com
On Jun 2, 2020, at 2:12 PM, Xiaodi Wu [email protected] wrote:
Hey @rothomp3 https://github.com/rothomp3, are you still working on this issue? I've been exploring a design where we keep the lowest word inline. It seems promising and I'd like to work on it more, but don't want to step on any ongoing plans you have.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apple/swift-numerics/pull/84#issuecomment-637717681, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6NZ2WJXSFS4PMGISEM2SDRUU6HPANCNFSM4J37T7WA.
Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!
Yup. Keeping the lowest word inline sure does make things worse! Check out some of the other ideas I've been exploring over at the other PR though :)
Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!
Yup. Keeping the lowest word inline sure does make things worse! Check out some of the other ideas I've been exploring over at the other PR though :)
I don't even have a good explanation for why, other than perhaps "the people who wrote Swift.Array are very smart, and very good at their jobs!"
I don't even have a good explanation for why
The short version is pretty much "special cases make things slower". For some input distributions, it should be a win to have a special small representation (like we do for String), but that's definitely something that we would take about adding down the road--we want to get the general implementation right first.
What's the status on these changes?