BigInt
BigInt copied to clipboard
Using `ManagedBufferPointer` instead of `Array` as a storage
Hi,
Recently I had to write my own BigInt implementation for Violet - Python VM written in Swift.
Internally I decided to use ManagedBufferPointer instead of Swift Array. The whole design in one sentence would be: union (via tagged pointer) of Int32 (called Smi, after V8) and a heap allocation (magnitude + sign representation) with ARC for garbage collection. The detailed explanation is available in our documentation.
Naturally I'm quite curious why most of the BigInt libraries (including this one) use Array. The current implementation gives you (2014 rMBP with Intel x64):
print("BigUInt.size:", MemoryLayout<BigUInt>.size) // 32
print("BigUInt.stride:", MemoryLayout<BigUInt>.stride) // 32
print("BigInt.size:", MemoryLayout<BigInt>.size) // 33
print("BigInt.stride:", MemoryLayout<BigInt>.stride) // 40
Going with ManagedBufferPointer would give us much smaller numbers:
// Basically our own version of `Swift.Array` specialized for storing `Words`.
// Mainly deals with COW.
struct BigIntStorage {
struct Header {
var count: Int
}
typealias Word = UInt
typealias Buffer = ManagedBufferPointer<Header, Word>
}
struct BigUInt2 {
typealias Word = BigIntStorage.Word
enum Kind {
case inline(Word, Word)
case slice(from: Int, to: Int)
case array
}
var kind: Kind
var storage: BigIntStorage // <- This line changed!
}
struct BigInt2 {
enum Sign {
case plus
case minus
}
typealias Magnitude = BigUInt2
typealias Word = BigUInt.Word
public var magnitude: BigUInt2
public var sign: Sign
}
print("BigUInt2.size:", MemoryLayout<BigUInt2>.size) // 17
print("BigUInt2.stride:", MemoryLayout<BigUInt2>.stride) // 24
print("BigInt2.size:", MemoryLayout<BigInt2>.size) // 18
print("BigInt2.stride:", MemoryLayout<BigInt2>.stride) // 24
I believe that this approach would have following advantages:
-
better usage of CPU cache - in the current design
BigInthas size 33 and stride 40. WithManagedBufferPointerwe have size 18 and stride 24. This does not matter for aBigIntas a type, but it may matter in real-life scenarios, for example when it has to be stored in astructon anArray. (Just a reminder: intel cpus have 64 bytes cache line and M1 128 bytes, though I do not own the M1 device to check this). -
BigIntStorageis specialized for storingWordwhich means that it can do some things in a more efficient way thanSwift.Array. -
potential further optimizations - I believe that you could bring the stride to 16, but then: inline value would be a single
Word(instead of 2Words) and the slicefrom/towould have to beInt32(instead ofInt) + some minor rearrangement of how things are stored internally. It may not be worth it though.
The downside is that you would have to implement your own heap storage based on ManagedBufferPointer, but this is not that difficult.
As for any regressions: I also propose #98 Using tests from “Violet - Python VM written in Swift”. So, first I would add test cases and them we could (maybe) talk about ManagedBufferPointer.
This sounds great. Did you already benchmark both approaches?
This is a little bit more complicated. There is no silver bullet and there are multiple ways in which you can implement a BigInt depending on what use-cases you target.
Before I implement this change I want to close the #98 Using tests from “Violet - Python VM written in Swift”.
The improvements (if any) would be only in some specific scenarios, definitely not in the most common case then the test looks like this:
let a: BigInt = …
let b: BigInt = …
do something with them, maybe even I a loop…
Stride only matters in continuous storage, like arrays and structs. In Violet having a stride 8 (single pointer) means that we can fit more BigInts in a single cache line which matters in some scenarios.
In addition, things work well in Violet because we only have 2 representations:
smi-Int32inside pointerheap- heap allocated if the value is outside ofInt32range
In 99% of the cases we are smi which is nice for branch predictor in some very tight loops. This may not be the case for 'attaswift/BigInt' which has 3 representations.
Anyway, let's finish the #98 first and then (maybe) go back to this issue.