ecosystem icon indicating copy to clipboard operation
ecosystem copied to clipboard

[spark-tensorflow-connector] Cannot read multiple TFRecord files

Open prasannaVijay opened this issue 5 years ago • 2 comments

Using spark.read.format("tfrecord").load("path/to/one-file.tfrecord"), works. How do I read multiple directories with tfrecords in each? I have tried: spark.read.format("tfrecord").load(paths: _*), where paths is an array of paths. spark.read.format("tfrecord").load(path), where path is a regex of tfrecords paths. I have also tried using path as an option: spark.read.format("tfrecord").option("path", path).load() None of it works. Is there a recommended way to do this?

prasannaVijay avatar Apr 04 '19 21:04 prasannaVijay

The format is tfrecords and both spark.read.format("tfrecords").load("path/to/*file.tfrecord") and spark.read.format("tfrecords").load("path/to/one-file.tfrecord,path/to/another-file.tfrecord") work for me

manuzhang avatar Apr 21 '19 10:04 manuzhang

i find the reason, the directory do not look up recursive

Using spark.read.format("tfrecord").load("path/to/one-file.tfrecord"), works. How do I read multiple directories with tfrecords in each? I have tried: spark.read.format("tfrecord").load(paths: _*), where paths is an array of paths. spark.read.format("tfrecord").load(path), where path is a regex of tfrecords paths. I have also tried using path as an option: spark.read.format("tfrecord").option("path", path).load() None of it works. Is there a recommended way to do this?

liusulizzu avatar May 26 '22 11:05 liusulizzu