tensorflow_scala
tensorflow_scala copied to clipboard
Tensor data not read correctly from NPY files
When I create a NPY tensor file from Python and read it from Scala TensorFlow, the shape of the resulting Tensor is correct, but the content is all zeros.
For example, writing from Python:
import numpy as np
tensor = np.asarray([1.5, -30.4])
np.save('test.npy', tensor)
Reading from Scala TensorFlow:
val tensor = Tensor.fromNPY[Double](Paths.get("test.npy"))
println(tensor.summarize())
Output:
Tensor[Double, [2]]
[0.0, 0.0]
The output from Scala TensorFlow is the same when saving the Python tensor with allow_pickle=False. The tensor can be read without any problems from Python using Numpy.
What happens when you save a tensor from TF Scala and try to read it in Numpy? I only tested writing using TF Scala and reading back into TF Scala.
@Spiess I actually just tried your example and it worked fine for me with Python 3.7. Are you sure you pulled TF Scala master and not using an earlier version. I haven't released the snapshot binaries yet so you should try the example using the code from the master branch.
Ah, sorry, I was using TF Scala 0.4.1 and Python 2.7, so it's entirely possible it works on master.
No problem. In that case, and since I can't reproduce this anymore on master, I'll close this. Feel free to reopen if the issue persists. :)
I can confirm that npy files are not being correctly read in 0.4.2-SNAPSHOT either. Attached is iris_x.npy.gz which reads correctly into Python 3.6 but not into TF Scala 0.4.1 or 0.4.2-SNAPSHOT. These data are the features from the Iris problem set.
I load the (uncompressed) npy file and inspect it as follows:
val x_test = Tensor.fromNPY[Float](Paths.get("iris_x.npy"))
print(x_test.summarize())
This results in
Tensor[Float, [150, 4]]
[[5.1, 3.5, 1.4, 0.2],
[4.9, 3.0, 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
...,
[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]
The last 15 or so rows of the file are incorrectly read in as zeros.
I have run the debugger through this. The problem appears in Tensor.fromBuffer. In my example, the last 11 rows (11 * 4 columns * 4 bytes = 176 bytes) are set to zero. I cannot inspect the direct buffer's underlying allocated buffer without cloning the project and tooling up a bit.
this synchronized {
// TODO: May behave weirdly for direct byte buffers allocated on the Scala side.
val directBuffer = {
if (buffer.isDirect) {
buffer
} else {
val direct = ByteBuffer.allocateDirect(numBytes.toInt)
val bufferCopy = buffer.duplicate()
direct.put(bufferCopy.limit(numBytes.toInt).asInstanceOf[ByteBuffer]) // <<< This copy is suspect
direct
}
}
Shout if I can assist more.
@davidmweber thanks for testing this and digging a bit into it! I’ll try to resolve it once I’m back from traveling in about 3-4 days and so for now I’ll simply reopen the issue. :)
I have some code that I can repurpose to test ByteBuffer -> tensor and npy -> tensor implementations. the npy -> tensor needs some pre-stored file somewhere but I think it will be a useful test case
@davidmweber This would indeed be a useful test case. Could you share that using a PR so I use it to debug this issue? Thanks!