swift-apis
swift-apis copied to clipboard
Make ShapedArray.description's maxScalarCountPerLine user-customizable
Here is ShapedArray's fileprivate func description( indentLevel: Int, edgeElementCount: Int, maxScalarLength: Int, maxScalarCountPerLine: Int, summarizing: Bool ) -> String.
Is there any reason this is marked fileprivate? It's currently accessible only via the public func description( lineWidth: Int = 80, edgeElementCount: Int = 3, summarizing: Bool = false ) where the maxScalarCountPerLine is calculated for me:
let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)
Calculating the maxScalarCountPerLine independently for each Tensor leads to this problem:
=== Feature 0:
input:
[ -0.022060618, 0.024561103, -0.025651768, -0.04885944, 0.012175075, 0.006922609, -0.0516627,
-0.019092154, 0.024305645, -0.028501112, -0.047275346, 0.014285761, 0.00435431, -0.052575804,
-0.01609808, 0.023822624, -0.031269953, -0.04550122, 0.016227337, 0.0016748396, -0.05326795,
-0.013095862, 0.02311485, -0.03394214, -0.043547418, 0.017988473, -0.001100175, -0.053735107,
-0.010103013, 0.022186458, -0.036502086, -0.0, 0.0, -0.0039545433, -0.053974554]
output:
[ -0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, -0.0,
-0.0, -0.0, 0.0, 0.0, -0.0, 0.0, -0.0, -0.0, 0.0,
0.0, 0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, -0.0,
-0.0, 0.0, 0.0, -0.0, -1.6888539, -0.99011576, 0.0, 0.0]
target:
[ -0.0, 0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0,
0.0, -0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0,
-0.0, -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0,
-0.0, 0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.041425332,
0.019558901, -0.0, -0.0]
Each Tensor has a different number of scalars per line, so the values are visually shifted in each of the three descriptions. This makes it difficult to visually inspect the values at the same position in each tensor.
I would like to be able to force the max number of scalars per line for each Tensor so that the values are more readily visually comparable.
Here's the PR that added Tensor pretty-printing: https://github.com/apple/swift/pull/23837. The Swift implementation of ShapedArray.description is largely based on the PyTorch pretty-printing implementation, which is a simpler version of NumPy's: TF-419.
The original goal with Tensor pretty-printing was to closely match the output of PyTorch. Does your example print better in existing n-d array libraries, like NumPy or PyTorch? I wonder if PyTorch printing exposes enough knobs to achieve what you'd like to do, without using unreasonably unsafe or private APIs.
Thanks @dan-zheng. I copied the above linked swift-apis code into my project:
extension String {
/// Returns a string of the specified length, padded with whitespace to the left.
func leftPadded(toLength length: Int) -> String {
return repeatElement(" ", count: max(0, length - count)) + self
}
}
public extension ShapedArray {
func vectorDescription(
indentLevel: Int,
edgeElementCount: Int,
maxScalarLength: Int,
maxScalarCountPerLine: Int,
summarizing: Bool
) -> String {
// Get scalar descriptions.
func scalarDescription(_ element: Element) -> String {
let description = String(describing: element)
return description.leftPadded(toLength: maxScalarLength)
}
var scalarDescriptions: [String] = []
if summarizing && count > 2 * edgeElementCount {
scalarDescriptions += prefix(edgeElementCount).map(scalarDescription)
scalarDescriptions += ["..."]
scalarDescriptions += suffix(edgeElementCount).map(scalarDescription)
} else {
scalarDescriptions += map(scalarDescription)
}
// Combine scalar descriptions into lines, based on the scalar count per line.
let lines = stride(
from: scalarDescriptions.startIndex,
to: scalarDescriptions.endIndex,
by: maxScalarCountPerLine
).map { i -> ArraySlice<String> in
let upperBound = Swift.min(
i.advanced(by: maxScalarCountPerLine),
scalarDescriptions.count)
return scalarDescriptions[i..<upperBound]
}
// Return lines joined with separators.
let lineSeparator = ",\n" + String(repeating: " ", count: indentLevel + 1)
return lines.enumerated().reduce(into: "[") { result, entry in
let (i, line) = entry
result += line.joined(separator: ", ")
result += i != lines.count - 1 ? lineSeparator : ""
} + "]"
}
func description(
indentLevel: Int,
edgeElementCount: Int,
maxScalarLength: Int,
maxScalarCountPerLine: Int,
summarizing: Bool
) -> String {
// Handle scalars.
if let scalar = scalar {
return String(describing: scalar)
}
// Handle vectors, which have special line-width-sensitive logic.
if rank == 1 {
return vectorDescription(
indentLevel: indentLevel,
edgeElementCount: edgeElementCount,
maxScalarLength: maxScalarLength,
maxScalarCountPerLine: maxScalarCountPerLine,
summarizing: summarizing)
}
// Handle higher-rank tensors.
func elementDescription(_ element: Element) -> String {
return element.description//(
/*indentLevel: indentLevel + 1,*/
/*edgeElementCount: edgeElementCount,*/
/*maxScalarLength: maxScalarLength,*/
/*maxScalarCountPerLine: maxScalarCountPerLine,*/
/*summarizing: summarizing)*/
}
var elementDescriptions: [String] = []
if summarizing && count > 2 * edgeElementCount {
elementDescriptions += prefix(edgeElementCount).map(elementDescription)
elementDescriptions += ["..."]
elementDescriptions += suffix(edgeElementCount).map(elementDescription)
} else {
elementDescriptions += map(elementDescription)
}
// Return lines joined with separators.
let lineSeparator =
"," + String(repeating: "\n", count: rank - 1)
+ String(repeating: " ", count: indentLevel + 1)
return elementDescriptions.enumerated().reduce(into: "[") { result, entry in
let (i, elementDescription) = entry
result += elementDescription
result += i != elementDescriptions.count - 1 ? lineSeparator : ""
} + "]"
}
}
And this achieved what I wanted:
=== Variable 0:
input:
[-0.046002183, 0.015716469, 0.0024140144, -0.05310176, -0.013912535, 0.023329472, -0.033225544, -0.044096183,
0.017527763, -0.00033658138, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0,
-0.0031709874, -0.05393206, -0.007940362, 0.021374973, -0.038286343, -0.03978186, 0.020576812, -0.006072663,
-0.0540048, -0.005004768, 0.020078853, -0.04061756, -0.037398703, 0.021796772, -0.009024683, -0.053848244,
-0.002125753, 0.01857983, -0.04279759]
output:
[ -0.0, -0.0, 0.0, 0.0, -0.0, 0.0, -0.0, 0.0,
0.0, -0.0, -0.2371707, 1.7225978, -2.3474987, -1.0847256, 1.6712375, 0.3812195,
-0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0, 0.0,
-0.0, 0.0, -0.0, 0.0, 0.0, -0.0, -0.0, -0.0,
0.0, -0.0, -0.0]
target:
[ -0.0, 0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0,
0.0, -0.0, -0.053630464, -0.010915402, 0.022460626, -0.035817537, -0.042018704, 0.019151034,
-0.0, -0.0, -0.0, 0.0, -0.0, -0.0, 0.0, -0.0,
-0.0, -0.0, 0.0, -0.0, -0.0, 0.0, -0.0, -0.0,
-0.0, 0.0, -0.0]
The corresponding values are all lined up visually now. Given that it works as-is, I was wondering why it needed to be fileprivate.
I'm not seeing any options in Pytorch's set_printoptions that would achieve this. Nothing stands out in the numpy implementation that would achieve it either. No worries if this is outside the scope of the intended API.
Nice, thanks for sharing your code snippet! Could you please complete the example by showing the invocation of ShapedArray.description(...) used to print the array contents?
I would describe your change as "adding maxScalarCountPerLine as a customizable argument to ShapedArray.description(...)" - would you agree with this more pointed description? If so, I might recommend changing the PR title to be more specific along those lines. Currently, the title sounds like "changing private API to be public", which sounds scarier.
Supporting this change tentatively sounds good to me (I haven't thought about it super hard). Do you have some intuition why maxScalarCountPerLine should be user-customizable instead of always computing it from other arguments (maxScalarLength, edgeElementCount)? Feel free to start a PR with tests for review!
Each ShapedArray was printed with:
print(myTensor[TensorRange.ellipsis, variableIndex].array.description(indentLevel: 0,
edgeElementCount: 50,
maxScalarLength: 10,
maxScalarCountPerLine: 8,
summarizing: true))
Oh yes, the idea of opening currently private API is incidental. The core idea is supporting maxScalarCountPerLine in ShapedArray's description. Thanks, I changed the title.
maxScalarCountPerLine should be user-customizable so that tensors can be printed with the same number of rows and columns, thus making different tensors visually comparable, like in my above two examples. I think right now maxScalarCountPerLine is a function of lineWidth and maxScalarLength:
public func description(
lineWidth: Int = 80,
edgeElementCount: Int = 3,
summarizing: Bool = false
) -> String {
let maxScalarLength = scalars.lazy.map { String(describing: $0).count }.max() ?? 3
let maxScalarCountPerLine = Swift.max(1, lineWidth / maxScalarLength)
where only the lineWidth is user-customizable. This makes it difficult or impossible to control the number of columns that are printed for a given tensor, which makes it difficult to visually compare tensors.
This was born out of a desire to understand my loss values. Printing a snippet of my model's output alongside input and target allows me to understand how close the outputs really are to the targets in absolute value for a given valLoss.
Thanks for the context! Feel free to open a PR, if you'd like to upstream your func description(...) changes.
Thanks @dan-zheng, I'll plan to open a pull request with some unit tests after Christmas.
Thank you! Take your time - Santa and we are patiently waiting for you after Xmas.
🎅
🎅
tbh GitHub needs to enable more than eight reaction emoji 🙎🏻♂️