waymo-open-dataset
waymo-open-dataset copied to clipboard
Error using data-conversion/scenario_conversion parsing text-format waymo.open_dataset. Scenario: 1:2: Interpreting non-ascii codepoint 192
I am encountering an error while using the Waymo Open Dataset conversion library to convert a Waymo scenario file to a TensorFlow Example and export it to a tfrecord file. When running my code using bazel build, I get the following error:
Error parsing text-format waymo.open_dataset.Scenario: 1:2: Interpreting non-ascii codepoint 192
I suspect the issue might be related to the encoding of the input file. Specifically, I am reading the scenario file in binary mode, but it might be encoded in a non-standard format. Any suggestions would be helpful.
Here's a code snippet that reproduces the error:
#include <vector>
#include <string>
#include <fstream>
#include "waymo_open_dataset/data_conversion/scenario_conversion.h"
#include "waymo_open_dataset/protos/conversion_config.pb.h"
#include "absl/strings/str_cat.h"
#include "tensorflow/core/example/example.pb.h"
#include "tensorflow/core/lib/io/record_writer.h"
#include "tensorflow/core/platform/env.h"
#include "google/protobuf/text_format.h"
#include "google/protobuf/io/zero_copy_stream_impl.h"
int main() {
// Load the input data from file (assuming scenario.pbtxt and config.pbtxt exist).
std::string scenario_file_path =
std::string scenario_file_path = "path/to/waymo_open_dataset_motion_v_1_2_0/training_20s/training_20s.tfrecord-00667-of-01000";
waymo::open_dataset::Scenario scenario;
// Read the scenario from file.
std::ifstream input(scenario_file_path, std::ios::in | std::ios::binary);
if (!input) {
std::cerr << "Failed to open " << scenario_file_path << std::endl;
return 1;
}
// Print the contents_print of the file.
std::stringstream buffer;
buffer << input.rdbuf();
std::string contents = buffer.str();
// parse the text format from the string
if (!google::protobuf::TextFormat::ParseFromString(
contents, &scenario)) {
std::cerr << "Failed to parse " << scenario_file_path << std::endl;
return 1;
}
waymo::open_dataset::MotionExampleConversionConfig config;
// Convert the scenario to a TensorFlow Example.
std::map<std::string, int> counters;
absl::StatusOr<tensorflow::Example> status_or_example =
waymo::open_dataset::ScenarioToExample(scenario, config, &counters);
if (!status_or_example.ok()) {
std::cerr << "Failed to convert scenario to Example: "
<< status_or_example.status().message() << std::endl;
return 1;
}
tensorflow::Example example = status_or_example.value();
// Output files to tfrecord
// Create a new writable file
tensorflow::Env* env = tensorflow::Env::Default();
std::unique_ptr<tensorflow::WritableFile> file;
std::string file_name = "example.tfrecord";
env->NewWritableFile(file_name, &file);
// Create a record writer and write the example to file
tensorflow::io::RecordWriterOptions options = tensorflow::io::RecordWriterOptions::CreateRecordWriterOptions("");
tensorflow::io::RecordWriter writer(file.get(), options);
std::string example_string;
example.SerializeToString(&example_string);
writer.WriteRecord(example_string);
// Close the file and output success message
file->Close();
std::cout << "Example exported to " << file_name << std::endl;
std::cout << "Example exported to output.tfrecord." << std::endl;
return 0;
} ```
Hi, I think the issue is that the files are stored in the tensorflow tfrecord format. It looks like your code tries to read it directly as a string. You will need to read each scenario as a single record from the tfrecord input files, then process them and write them back out as your code currently does. I have not used it but I think that the RecordReader here might be what you need.
Note that if you are not modifying the default configuration (or the conversion code), we provide the converted data (using the default configuration) already in the open dataset repository.
Please let me know if you have further questions.