weather-tools icon indicating copy to clipboard operation
weather-tools copied to clipboard

weather-mv: Improve tool's efficiency in terms of time & memory.

Open mahrsee1997 opened this issue 2 years ago • 2 comments

Time Efficient:

Make use of gcloud alpha storage in open_local() method, sinks.py.

Findings -- using gsutil for downloading the data from gcs to the local file system is 5 times slower compared to gcloud alpha storage.

Memory Efficient:

Every time when we log xr_dataset.nbytes it will takes the complete dataset in-memory which is causing OOM killer invocation. TODO: Find a better way for logging the dataset size.

Real-time data ingestion into BQ:

beam.io.WriteToBigQuery() -- in case of batch pipeline data is not ingested into BQ in real-time. Because batch pipeline processes all elements before writing to BigQuery.

mahrsee1997 avatar Feb 03 '23 12:02 mahrsee1997

A temporary fix has been implemented on the mv-optimization branch (link). Further work is required to prepare the changes for merging.

mahrsee1997 avatar Feb 24 '23 10:02 mahrsee1997

Fixed:

  • Time Efficiency: #315
  • Memory Efficiency: #323

mahrsee1997 avatar Jul 17 '23 06:07 mahrsee1997