swift-numerics icon indicating copy to clipboard operation
swift-numerics copied to clipboard

[Sketch]Arbitrary-precision BigInt

Open rothomp3 opened this issue 5 years ago • 34 comments
trafficstars

Fixes #5

This is the step 2 "Sketch" PR. It's actually very complete, but obviously perhaps not ideal in style and documentation (since there is none of the latter). Probably needs more tests too, though there are some.

The basic architecture is based on Knuth's algorithms for this purpose. An integer is broken up into "digits" where each digit is in base 64 (or base 32 on 32-bit systems, which are supported). This maps nicely onto the BinaryInteger protocol since we can simply use an array of Uint for the words. We're also using 2's complement internally, which is super convenient because then for any pair of BigInts that are each only one word long, we can simply delegate to the regular operators for a minimal overhead.

rothomp3 avatar Dec 17 '19 18:12 rothomp3

Ok, new commit to address the floating-point feedback. There doesn't seem to be any easy way to test the one with the T.greatestFiniteMagnitude absent a Float16 type in the stdlib.

rothomp3 avatar Dec 17 '19 20:12 rothomp3

@rothomp3 Friendly reminder that you can "bulk" manage "access control" using extensions. Extension also give more clear groupings (especially when used with // MARK (which renders nicely in Xcode 11)).

I recommend declaring BigInt in one scope with just the stored properties (and if applicable any "designated" initializer. I write "designated" since structs does not have that formally, but sometimes conceptually).

So:

public extension BigInt {
    func foo() {}

    func bar() {}
}

Which you can at any point change to internal just by changing one statement, as opposed to what you currently have:

extension BigInt {
    public func foo() {}

   public func bar() {}
}

Sajjon avatar Dec 18 '19 13:12 Sajjon

@rothomp3 I have not yet looked through your PR in detail, I'm sure it is great! But I just wanted to give you two references - which you surely know about, but if not they might be useful:

Apple's own BigInt Prototype

attaswift/BigInt

Sajjon avatar Dec 18 '19 13:12 Sajjon

@rothomp3 I have not yet looked through your PR in detail, I'm sure it is great! But I just wanted to give you two references - which you surely know about, but if not they might be useful:

Apple's own BigInt Prototype

attaswift/BigInt

So both of those use additional storage for the sign, and all the extra math required for that, instead of using 2's complement to store the BigInts the same way the language/CPU already store signed integers of 64 bits or less. I believe my way more effectively leverages the hardware.

rothomp3 avatar Dec 18 '19 19:12 rothomp3

I believe my way more effectively leverages the hardware.

Do we have benchmarks that demonstrate this? (The BigInt implementations I’ve seen use sign-magnitude, and I don’t think it’s principally for ease of implementation.)

xwu avatar Dec 19 '19 15:12 xwu

I believe my way more effectively leverages the hardware.

Do we have benchmarks that demonstrate this?

It makes (approximately) no difference. This is pretty easy to see even without benchmarks.

First off, note that converting from one to the other is O(n), and every arithmetic and bitwise operation defined on FixedWidthInteger is at least O(n), so there can be no difference in asymptotic performance--you could always just convert, do the operation, and convert back.

So we're only interested in the the constants.

  • bitwise operations are exactly the same either way.
  • addition and subtraction are slightly easier in the 2s complement representation.
  • multiplication requires a tiny bit more bookkeeping (I really mean a tiny bit--the difference is essentially immaterial, a couple additional operations on Int / UInt for schoolbook multiplication because you need to treat the high-order word differently from the rest, and essentially no additional operations for fast multiplication algorithms).
  • division is just multiplication and subtraction, so doesn't matter.

This approach is fine, and most closely matches the semantics of FixedWidthInteger, so it's the easiest to use in some ways.

@rothomp3, sorry, I've been busy this week, but I'll give this a more thorough review next week and once we're to a good place I'll set up a branch to merge it onto for further development.

stephentyrone avatar Dec 19 '19 17:12 stephentyrone

@rothomp3 Friendly reminder that you can "bulk" manage "access control" using extensions.

We deliberately don't do this in the standard library, and I'm going to follow that style with Swift Numerics to facilitate moving code into the standard library if evolution wants some of these features in the future. I know that people have strong opinions about it, but ultimately it doesn't matter very much. Please keep explicit access control for now for simplicity.

stephentyrone avatar Dec 19 '19 17:12 stephentyrone

I was looking at the code for the BigInt failable initializer that takes a String and I had a couple of thoughts:

  • We should allow for the presence of a prefix + operator
  • We can do remove redundant operations by dropping leading zeros

I think the initializer should be modified to something along the lines of the following:

public init?(_ description: String) {
    guard let firstCharacter = description.first else { return nil }
    var description = description
    var isNegative = false
    
    if firstCharacter == "-" {
        guard description.count > 2 else { return nil }
        isNegative = true
        description.removeFirst()
    } else if firstCharacter == "+" {
        guard description.count > 2 else { return nil }
        description.removeFirst()
    }
    
    let validDigits = Set("0123456789")
    guard description.allSatisfy({ validDigits.contains($0) }) else { return nil }
    var result: BigInt = 0
    
    for (i, char) in description.drop(while: { $0 == "0" }).reversed().enumerated() {
        result += BigInt(char.wholeNumberValue!) * pow(10, BigInt(i))
    }
    
    if isNegative {
        result = -result
    }
    
    words = result.words
}

Wildchild9 avatar Dec 23 '19 21:12 Wildchild9

Let me just say I think it's hilarious how much scrutiny the LosslessStringConvertible initializer has gotten, all because it happens to be the first method in the file ;)

rothomp3 avatar Dec 30 '19 21:12 rothomp3

Hi @rothomp3, I created a "biginteger" branch for this work, since there's more development to be done than fits neatly in a single PR. Can you update this PR to point to that branch (I can do this for you if you need help, but it will need to wait a couple days).

stephentyrone avatar Dec 30 '19 21:12 stephentyrone

Retargeted PR to biginteger branch!

rothomp3 avatar Dec 30 '19 22:12 rothomp3

Please could the APIs be grouped and sorted? (example)

Or by using an extension for each protocol? (example)

benrimmington avatar Dec 31 '19 23:12 benrimmington

@benrimmington It's unfortunate that it apparently still requires an approval to be merged into the branch created just for this purpose haha. If this PR were able to be merged there, then you could create a new PR for the reorganization. I'll see about doing it myself, though.

I've been on vacation for the last two weeks or so, but I'm back now!

rothomp3 avatar Jan 13 '20 19:01 rothomp3

you could create a new PR for the reorganization

There's no issue with doing this now by creating a PR against your clone.

stephentyrone avatar Jan 13 '20 19:01 stephentyrone

That's true, I feel silly for not having thought of that myself.

rothomp3 avatar Jan 13 '20 19:01 rothomp3

Please see rothomp3/swift-numerics#1

benrimmington avatar Jan 13 '20 20:01 benrimmington

Thanks to @benrimmington, hopefully this is easier to read now :)

rothomp3 avatar Jan 14 '20 19:01 rothomp3

Please see rothomp3/swift-numerics#2

benrimmington avatar Jan 15 '20 01:01 benrimmington

@stephentyrone Will there be an issue with using the same name for a type and its module?

Pitch: Fully qualified name syntax

Should we add a prefix/suffix to the product and target names? For example:

import NMXBigInt  // struct BigInt
import NMXComplex // struct Complex
import NMXReal    // protocol Real

UPDATE: apple/swift-numerics#94

benrimmington avatar Jan 18 '20 10:01 benrimmington

OK, @benrimmington I fixed the failing tests, which had identified a genuine problem! Now we just need to find an implementation of an arbitrarily large BinaryFloatingPoint to really test it haha. Theoretically, that would be eligible to be a completely separate addition to this very project, wouldn't it?

rothomp3 avatar Jan 21 '20 22:01 rothomp3

Ok, division now officially works. It probably could be optimized more.

rothomp3 avatar Jan 28 '20 21:01 rothomp3

For example, I could not construct a circumstance where qhat was ever larger than Uint.max + 1. My guess is this is because of the normalization, and in fact qhat will always fit in a single word. If this is true, then _findQhat could be simplified quite a bit!

rothomp3 avatar Jan 28 '20 21:01 rothomp3

We definitely could do something like that, I’d need to try this code and make sure it actually catches all the cases correctly (looks fine at first glance).

On Mar 28, 2020, at 6:18 AM, David Waite [email protected] wrote:

@dwaite commented on this pull request.

In Sources/BigInt/BigInt.swift https://github.com/apple/swift-numerics/pull/84#discussion_r399645486:

  •  let lhsWord = lhs.words[0]
    
  •  let rhsWord = rhs.words[0]
    
  •  let (result, isOverflow) = lhsWord.addingReportingOverflow(rhsWord)
    
  •  if !isOverflow && result < Int.max {
    
  •    lhs.words[0] = result
    
  •    return
    
  •  }
    
  •  let knownNegativeResult = lhsWord > Int.max && rhsWord > Int.max
    
  •  if lhsWord > Int.max || rhsWord > Int.max, !knownNegativeResult {
    
  •    // positive + negative is always smaller, so overflow is a red herring
    
  •    lhs.words[0] = result
    
  •    return
    

Have you evaluated using pattern matches to clarify some of this code and make it a bit easier to spot any edge cases. For instance, here I believe you could do:

if lhs.words.count == 1, rhs.words.count == 1 {
  let lhsWord = lhs.words[0]
  let rhsWord = rhs.words[0]
  let lhsNegative = lhsWord > Int.max
  let rhsNegative = rhsWord > Int.max
  let sum = lhsWord &+ rhsWord
  let sumNegative = sum > Int.max
  switch (lhsNegative, rhsNegative, sumNegative) {
    case (true, true, false);
      lhs.words = [sum, UInt.max]
      return
    case (false, false, true);
      lhs.words = [sum, 0]
      return
    default:
      lhs.words[0] = sum
      return
  }
}

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apple/swift-numerics/pull/84#pullrequestreview-383300829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6NZ2R5V7N54VYEIXZV32LRJXFGJANCNFSM4J37T7WA.

rothomp3 avatar Mar 30 '20 16:03 rothomp3

Hey @rothomp3, are you still working on this issue? I've been exploring a design where we keep the lowest word inline. It seems promising and I'd like to work on it more, but don't want to step on any ongoing plans you have.

xwu avatar Jun 02 '20 18:06 xwu

Hi all, I've shared an alternative implementation at #120 which starts from a different approach.

xwu avatar Jun 08 '20 20:06 xwu

Sorry I missed this message at the time, I am happy to keep working on it, but hadn’t heard anything in quite a while from the people responsible for merging it or telling me why they can’t ;) Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!

Robert Thompson Platform Software Engineer WillowTree https://willowtreeapps.com

On Jun 2, 2020, at 2:12 PM, Xiaodi Wu [email protected] wrote:

Hey @rothomp3 https://github.com/rothomp3, are you still working on this issue? I've been exploring a design where we keep the lowest word inline. It seems promising and I'd like to work on it more, but don't want to step on any ongoing plans you have.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apple/swift-numerics/pull/84#issuecomment-637717681, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6NZ2WJXSFS4PMGISEM2SDRUU6HPANCNFSM4J37T7WA.

rothomp3 avatar Jun 10 '20 15:06 rothomp3

Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!

Yup. Keeping the lowest word inline sure does make things worse! Check out some of the other ideas I've been exploring over at the other PR though :)

xwu avatar Jun 10 '20 21:06 xwu

Anyway, I actually explored this option myself long before I made this PR, and in my testing it was dramatically worse. It’s possible I’m doing it wrong though!

Yup. Keeping the lowest word inline sure does make things worse! Check out some of the other ideas I've been exploring over at the other PR though :)

I don't even have a good explanation for why, other than perhaps "the people who wrote Swift.Array are very smart, and very good at their jobs!"

rothomp3 avatar Jun 11 '20 19:06 rothomp3

I don't even have a good explanation for why

The short version is pretty much "special cases make things slower". For some input distributions, it should be a win to have a special small representation (like we do for String), but that's definitely something that we would take about adding down the road--we want to get the general implementation right first.

stephentyrone avatar Jun 17 '20 14:06 stephentyrone

What's the status on these changes?

ezfe avatar Jan 27 '22 01:01 ezfe