weather-tools
weather-tools copied to clipboard
weather-mv: Improve tool's efficiency in terms of time & memory.
Time Efficient:
Make use of gcloud alpha storage
in open_local() method, sinks.py.
Findings -- using gsutil
for downloading the data from gcs to the local file system is 5 times slower compared to gcloud alpha storage
.
Memory Efficient:
Every time when we log xr_dataset.nbytes
it will takes the complete dataset in-memory which is causing OOM killer invocation.
TODO: Find a better way for logging the dataset size.
Real-time data ingestion into BQ:
beam.io.WriteToBigQuery() -- in case of batch pipeline data is not ingested into BQ in real-time. Because batch pipeline processes all elements before writing to BigQuery.
A temporary fix has been implemented on the mv-optimization branch (link). Further work is required to prepare the changes for merging.
Fixed:
- Time Efficiency: #315
- Memory Efficiency: #323