spacetech-ssa
spacetech-ssa copied to clipboard
Serialization fails for some users
Some users are seeing the following issue when the ETL job tries to serialize the results to parquet:
(base) ➜ orbit_prediction git:(master) python3 orbit_prediction/spacetrack_etl.py --st_user [email protected] --st_password <mypassword> --norad_id_file sample_data/test_norad_ids.txt --past_n_days 10 --output_path outputfile
INFO:__main__:Fetching Satellite Catalog Data...
INFO:__main__:Number of TLE Batch Requests: 1
INFO:__main__:Starting to fetch TLEs from space-track.org
INFO:__main__:Processing batch 1/1
INFO:__main__:Fetching TLEs for 20 ASOs...
INFO:__main__:Parsing raw TLE data...
INFO:__main__:Finished fetching TLEs
INFO:__main__:Calculating orbital state vectors for 372 TLEs...
INFO:__main__:Serializing data...
Traceback (most recent call last):
File "orbit_prediction/spacetrack_etl.py", line 309, in <module>
orbit_data_df.to_parquet(args.output_path)
File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", line 214, in wrapper
return func(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2109, in to_parquet
to_parquet(
File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/io/parquet.py", line 260, in to_parquet
return impl.write(
File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/io/parquet.py", line 112, in write
self.api.parquet.write_table(
File "/usr/local/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 1733, in write_table
writer.write_table(table, row_group_size=row_group_size)
File "/usr/local/anaconda3/lib/python3.8/site-packages/pyarrow/parquet.py", line 591, in write_table
self.writer.write_table(table, row_group_size=row_group_size)
File "pyarrow/_parquet.pyx", line 1433, in pyarrow._parquet.ParquetWriter.write_table
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[ms] would lose data: 1602624833909568000
It looks to be an issue with pyarrow
and this stackoverflow thread may provide a solution.