[BUG] Cannot write Int64 hist (with np.uint64 backend) to root file using uproot
Describe the bug
It is not possible to write hist objects with storage=hist.storage.Int64() to root files using uproot
Steps to reproduce
The following code reproduces the problem:
import hist
import uproot
with uproot.recreate("out.root") as f:
hist_double = hist.Hist(hist.axis.Regular(10, 0, 10), storage=hist.storage.Double())
hist_int64 = hist.Hist(hist.axis.Regular(10, 0, 10), storage=hist.storage.Int64())
f["hist_double"] = hist_double
f["hist_int64"] = hist_int64
hist_double is written as expected. root files can easily store int64 histograms, however, the last line results in the following exception:
ValueError: data to convert to TArray must have signed integer or floating-point type, not dtype('>i8')
The uproot to_TArray function looks it only supports writing signed integers, and hist.storage.Int64() (surprisingly?) uses a np.uint64 backend for Int64(). Is there a workaround to write hist objects to root files using uproot without conversion to Double?
Just as a side-comment about surprise: if an unsigned 64-bit integer needs to be extended to handle signed values, the only possible extension is floating point:
>>> np.array([1, 2, 3], np.uint64) + np.array([1, 2, 3], np.int8)
array([2., 4., 6.])
(Floating point numbers are the only ones that cover the domain of large uint64 values, greater than 2**63 - 1, as well as negative values. There isn't an integer type with both features.)
On the actual issue, the Uproot to_TArray function only handles the dtypes listed because those are the only ones that have a TArray specialization in ROOT:
We could change to_TArray to do some conversions for the TArray types that don't exist in ROOT. Technically, the proper thing to do with uint64 is to expand it to float64, not truncate it to int64. Although we rarely care about the values between 2**63 - 1 and 2**64 - 1, this type promotion is well established, to the point where someone else would think it's a bug if uint64 got converted into int64.
Should this instead be an issue on boost-histogram about hist.storage.Int64 being unsigned? Does this Storage type support negative weights (or any weights at all)? If so, then it can't be unsigned. Anyway, the name is misleading; "Int64" is taken to mean "signed" in many languages and libraries.
boost-histogram defines the different storage types here, and int64 is defined to use int64_t. This was previously changed from uint64_t to int64_t.
Indeed, despite conflicting documentation but in agreement with its name, Int64 hist-objects will happily accept negative integers:
myh = hist.Hist.new.Regular(10, 0, 10).Int64()
myh[0] = -2 # works
assert(myh.values().dtype == np.int64) # True
assert(myh.values().dtype.type == np.int64) # False
assert(myh.values().dtype.type == np.long) # True
So, there is no reason why storing Int64 histograms (hist/boost-histogram) into root files should not be possible, but uproot disagrees (because issubclass(np.longlong, np.int64) == False).
See also this numpy bug report.
Conflicting docs probably came from the original start using uint. Only having 9223372036854775807 as a maximum instead of 18446744073709551615 isn't a bit deal, and it's nice to be able to support negative weights, histogram subtraction, etc. And it was needed to match NumPy.