java
java copied to clipboard
How to use TfRecordDataset DatasetToTfRecord tf.io.tfRecordReader
tensorflow-java 0.4 spark 3.1 java 11
Hi :
Now I use tensorflow-java to read tfrecord file ,but can not get the data, and our not have example for it ,the TfRecordDataset DatasetToTfRecord tf.io.tfRecordReader java class have not same api like python ,could we give some example for how to use them. thank
import org.tensorflow.{Operand, Session,EagerSession}
import org.tensorflow.op.Ops
import org.tensorflow.op.data.TfRecordDataset
import org.tensorflow.op.data.{DatasetToTfRecord, TfRecordDataset}
val session = EagerSession.create
val tf = Ops.create(session)
val scope = tf.scope()
// val fileName =tf.constant( "/Users/zhanghaining/Downloads/tfrecord-kk2-test/")
val fileName = tf.constant("/Users/zhanghaining/Downloads/BigDL/spark/dl/src/test/resources/tf/mnist_train.tfrecord")
val compress = tf.constant("")
val bufferSize = tf.constant(0l)
val recordDataSet = TfRecordDataset.create(scope,fileName,compress,bufferSize)
val record = DatasetToTfRecord.create(scope, recordDataSet,fileName,compress)
val reader = tf.io.tfRecordReader()
println(record.op().name() )
println(record.op().`type`())
println(recordDataSet.op().numOutputs() )
println(recordDataSet.asOutput().dataType())
c++ api demo
std::unique_ptr<tensorflow::RandomAccessFile> file;
auto tf_status = tensorflow::Env::Default()->NewRandomAccessFile(
cc->InputSidePackets().Tag(kTFRecordPath).Get<std::string>(), &file);
RET_CHECK(tf_status.ok())
<< "Failed to open tfrecord file: " << tf_status.ToString();
tensorflow::io::RecordReader reader(file.get(),
tensorflow::io::RecordReaderOptions());
Hi @mullerhai ,
Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling tf.data.makeIterator
). Also in your example here, the DatasetToTfRecord
is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.
If you don't mind adding org.tensorflow:tensorflow-framework
to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :
Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10);
for (List<Operand<?>> components : dataset) {
Operand<?> featureBatch = components.get(0);
Operand<?> labelBatch = components.get(1);
... operate on the batches directly
}
Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.
for (List<Operand<?>> components : dataset) { Operand<?> featureBatch = components.get(0); Operand<?> labelBatch = components.get(1); ... operate on the batches directly }
Great ,Thanks , but also I want to know how to convert Dataset to ByteNdArray ,or tfrecord to ByteNdArray,or convert Dataset to example ->org.tensorflow.example.example.{Example, SequenceExample}, Because of I need like this code style
NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))
to make tensor for model train
Maybe you can do this via parseExampleDataset
? There are also a bunch of utilities for parsing examples in the IO package, like this one.
Hi @mullerhai ,
Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling
tf.data.makeIterator
). Also in your example here, theDatasetToTfRecord
is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.If you don't mind adding
org.tensorflow:tensorflow-framework
to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10); for (List<Operand<?>> components : dataset) { Operand<?> featureBatch = components.get(0); Operand<?> labelBatch = components.get(1); ... operate on the batches directly }
Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.
in tensorflow-java 0.5.0-SNAPSHOT , EagerSession model, iter the element in dataset ,I find the element class type is OptionalGetValue or some type, I want to print the real value ,but failed
parseExampleDataset
val fp = tf.constant("/Volumes/Pink4T/transfer/code/github/stanford-tensorflow-tutorials/2017/data/friday.tfrecord")
val compress = tf.constant("")
val bufferSize = tf.constant(0l)
val datazs =tf.data.tfRecordDataset( fileNamec, compress, bufferSize)
println(datazs.asTensor())
I get the error: No tensor type has been registered for data type DT_VARIANT
We don't map (yet) DT_VARIANT
tensors in the Java space. Can you please provide the full stacktrace? I want to see where such tensor is being accessed from the JVM.
Hi @mullerhai ,
Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling
tf.data.makeIterator
). Also in your example here, theDatasetToTfRecord
is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.If you don't mind adding
org.tensorflow:tensorflow-framework
to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10); for (List<Operand<?>> components : dataset) { Operand<?> featureBatch = components.get(0); Operand<?> labelBatch = components.get(1); ... operate on the batches directly }
Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.
Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.
thanks!
Hi @mullerhai , Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling
tf.data.makeIterator
). Also in your example here, theDatasetToTfRecord
is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file. If you don't mind addingorg.tensorflow:tensorflow-framework
to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10); for (List<Operand<?>> components : dataset) { Operand<?> featureBatch = components.get(0); Operand<?> labelBatch = components.get(1); ... operate on the batches directly }
Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.
Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.
thanks!
No ,I have not make it real
Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.
thanks!
Sure, that will work in Graph mode as well, you just need to make sure that the tf
instance you are passing to Dataset.tfRecordDataset
is executing in a graph environment i.e. var tf = Ops.create(graph);
You won't be able to use a Java for loop though so you'll need to rely on other TF ops and methods exposed by the datasets/iterators to iterate through the examples within your graph.