tensorflow_scala icon indicating copy to clipboard operation
tensorflow_scala copied to clipboard

Tensor data not read correctly from NPY files

Open Spiess opened this issue 6 years ago • 9 comments

When I create a NPY tensor file from Python and read it from Scala TensorFlow, the shape of the resulting Tensor is correct, but the content is all zeros.

For example, writing from Python:

import numpy as np

tensor = np.asarray([1.5, -30.4])
np.save('test.npy', tensor)

Reading from Scala TensorFlow:

val tensor = Tensor.fromNPY[Double](Paths.get("test.npy"))

println(tensor.summarize())

Output:

Tensor[Double, [2]]
[0.0, 0.0]

The output from Scala TensorFlow is the same when saving the Python tensor with allow_pickle=False. The tensor can be read without any problems from Python using Numpy.

Spiess avatar Dec 17 '18 11:12 Spiess

What happens when you save a tensor from TF Scala and try to read it in Numpy? I only tested writing using TF Scala and reading back into TF Scala.

eaplatanios avatar Dec 17 '18 17:12 eaplatanios

@Spiess I actually just tried your example and it worked fine for me with Python 3.7. Are you sure you pulled TF Scala master and not using an earlier version. I haven't released the snapshot binaries yet so you should try the example using the code from the master branch.

eaplatanios avatar Dec 19 '18 18:12 eaplatanios

Ah, sorry, I was using TF Scala 0.4.1 and Python 2.7, so it's entirely possible it works on master.

Spiess avatar Dec 19 '18 21:12 Spiess

No problem. In that case, and since I can't reproduce this anymore on master, I'll close this. Feel free to reopen if the issue persists. :)

eaplatanios avatar Dec 19 '18 23:12 eaplatanios

I can confirm that npy files are not being correctly read in 0.4.2-SNAPSHOT either. Attached is iris_x.npy.gz which reads correctly into Python 3.6 but not into TF Scala 0.4.1 or 0.4.2-SNAPSHOT. These data are the features from the Iris problem set.

I load the (uncompressed) npy file and inspect it as follows:

val x_test = Tensor.fromNPY[Float](Paths.get("iris_x.npy"))
print(x_test.summarize())

This results in

Tensor[Float, [150, 4]]
[[5.1, 3.5, 1.4, 0.2],
 [4.9, 3.0, 1.4, 0.2],
 [4.7, 3.2, 1.3, 0.2],
 ...,
 [0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 0.0]]

The last 15 or so rows of the file are incorrectly read in as zeros.

davidmweber avatar Jan 03 '19 06:01 davidmweber

I have run the debugger through this. The problem appears in Tensor.fromBuffer. In my example, the last 11 rows (11 * 4 columns * 4 bytes = 176 bytes) are set to zero. I cannot inspect the direct buffer's underlying allocated buffer without cloning the project and tooling up a bit.

    this synchronized {
      // TODO: May behave weirdly for direct byte buffers allocated on the Scala side.
      val directBuffer = {
        if (buffer.isDirect) {
          buffer
        } else {
          val direct = ByteBuffer.allocateDirect(numBytes.toInt)
          val bufferCopy = buffer.duplicate()
          direct.put(bufferCopy.limit(numBytes.toInt).asInstanceOf[ByteBuffer]) // <<< This copy is suspect
          direct
        }
      }

Shout if I can assist more.

davidmweber avatar Jan 03 '19 07:01 davidmweber

@davidmweber thanks for testing this and digging a bit into it! I’ll try to resolve it once I’m back from traveling in about 3-4 days and so for now I’ll simply reopen the issue. :)

eaplatanios avatar Jan 05 '19 14:01 eaplatanios

I have some code that I can repurpose to test ByteBuffer -> tensor and npy -> tensor implementations. the npy -> tensor needs some pre-stored file somewhere but I think it will be a useful test case

davidmweber avatar Jan 06 '19 16:01 davidmweber

@davidmweber This would indeed be a useful test case. Could you share that using a PR so I use it to debug this issue? Thanks!

eaplatanios avatar Jan 30 '19 15:01 eaplatanios