ecosystem
ecosystem copied to clipboard
BytesList with length 0 or 1 is inferred to have StringType instead of ArrayType
If BytesList
in TFRecords has always length of 0 or 1, then the feature is inferred to have StringType
instead of ArrayType
. Is there a reason for this behavior? With this behavior you can write a DataFrame as TFRecords, but you can't read those TFRecords back to a DataFrame. Zero length BytesList
is valid in Tensorflow.
Below is the implementation of the parseBytesList
from
https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-connector/src/main/scala/org/tensorflow/spark/datasources/tfrecords/TensorFlowInferSchema.scala#L144:
private def parseBytesList(feature: Feature): DataType = {
val length = feature.getBytesList.getValueCount
if (length == 0) {
null
}
else if (length > 1) {
ArrayType(StringType)
}
else {
StringType
}
}
i also hit this problem , do you have any solutions