PlotJuggler icon indicating copy to clipboard operation
PlotJuggler copied to clipboard

A Parquet file with a BYTE_ARRAY and TIMESTAMP_MILLIS columns fails to load

Open ngbrown opened this issue 1 year ago • 2 comments

Problem description

I have a .parquet file that has a BYTE_ARRAY column, that is a string, in it. When PlotJuggler try's to load it, after selecting the time column, an exception is thrown and the file is not loaded.

Steps to reproduce (important)

  • Windows 10 22H2 (probably doesn't matter)
  • Have the DataLoadParquet plugin available to PlotJuggler.
  • Create a .parquet file with a BYTE_ARRAY column and a TIMESTAMP_MILLIS column.
  • Load the said file. Select a time column if appropriate.

Screenshot - 2023-08-28 , 7_08_43 PM

Exception stack:

 	parquet.dll!parquet::StreamReader::CheckColumn(parquet::Type::type physical_type, parquet::ConvertedType::type converted_type, int length) Line 527	C++
 	parquet.dll!parquet::StreamReader::operator>>(__int64 & v) Line 121	C++
>	DataLoadParquet.dll!DataLoadParquet::readDataFromFile(PJ::FileLoadInfo * info, PJ::PlotDataMapRef & plot_data) Line 264	C++
 	plotjuggler.exe!MainWindow::loadDataFromFile(const PJ::FileLoadInfo & info) Line 1510	C++
 	plotjuggler.exe!MainWindow::loadDataFromFiles(QStringList filenames) Line 1377	C++
 	plotjuggler.exe!MainWindow::on_pushButtonLoadDatafile_clicked() Line 2996	C++
 	plotjuggler.exe!MainWindow::qt_metacall(QMetaObject::Call _c, int _id, void * * _a) Line 522	C++

This error can be solved by adding os.SkipColumns(1); to dataload_parquet.cpp:176.

The next error is loading the column with TIMESTAMP_MILLIS.

Screenshot - 2023-08-28 , 7_19_10 PM

For this, I needed to add to the switch(converted_type):

case ConvertedType::TIMESTAMP_MILLIS:
{
  std::chrono::milliseconds tmp;
  os >> tmp;
  row_values[col] = static_cast<double>(tmp.count()) * (static_cast<double>(std::chrono::milliseconds::period::num) / static_cast<double>(std::chrono::milliseconds::period::den));
  break;
}
case ConvertedType::TIMESTAMP_MICROS:
{
  std::chrono::microseconds tmp;
  os >> tmp;
  row_values[col] = static_cast<double>(tmp.count()) * (static_cast<double>(std::chrono::microseconds::period::num) / static_cast<double>(std::chrono::microseconds::period::den));
  break;
}

ngbrown avatar Aug 29 '23 03:08 ngbrown

Seems like a reasonable fix to me. Consider making a pull request @ngbrown ?

zdavkeos avatar Sep 20 '23 19:09 zdavkeos

My changes to PlotJuggler to implement this and other features for Paquet file reading are here: https://github.com/facontidavide/PlotJuggler/compare/main...ngbrown-forks:PlotJuggler:parquet-windows

I don't have a pull request started because I haven't built a good, sharable, test .parquet file yet.

ngbrown avatar Oct 02 '23 20:10 ngbrown