spacetech-ssa icon indicating copy to clipboard operation
spacetech-ssa copied to clipboard

Serialization fails for some users

Open calstad opened this issue 3 years ago • 0 comments

Some users are seeing the following issue when the ETL job tries to serialize the results to parquet:

(base) ➜  orbit_prediction git:(master) python3 orbit_prediction/spacetrack_etl.py --st_user [email protected] --st_password <mypassword> --norad_id_file sample_data/test_norad_ids.txt --past_n_days 10 --output_path outputfile
INFO:__main__:Fetching Satellite Catalog Data...
INFO:__main__:Number of TLE Batch Requests: 1
INFO:__main__:Starting to fetch TLEs from space-track.org
INFO:__main__:Processing batch 1/1
INFO:__main__:Fetching TLEs for 20 ASOs...
INFO:__main__:Parsing raw TLE data...
INFO:__main__:Finished fetching TLEs
INFO:__main__:Calculating orbital state vectors for 372 TLEs...
INFO:__main__:Serializing data...
Traceback (most recent call last):
  File "orbit_prediction/spacetrack_etl.py", line 309, in <module>
    orbit_data_df.to_parquet(args.output_path)
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 214, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2109, in to_parquet
    to_parquet(
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/io/parquet.py", line 260, in to_parquet
    return impl.write(
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/io/parquet.py", line 112, in write
    self.api.parquet.write_table(
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 1733, in write_table
    writer.write_table(table, row_group_size=row_group_size)
  File "/usr/local/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 591, in write_table
    self.writer.write_table(table, row_group_size=row_group_size)
  File "pyarrow/_parquet.pyx", line 1433, in pyarrow._parquet.ParquetWriter.write_table
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1602624833909568000

It looks to be an issue with pyarrow and this stackoverflow thread may provide a solution.

calstad avatar Oct 23 '20 19:10 calstad