[BUG] Visualization of hourly generation breaks with large amounts of data
Conda environment check
- [X] I have tried using the
osemosys-globalconda environment.
Current Behavior
The visualization.py script throws an error (see log below) for the WORLD scenario run in the generation hourly function. I'm pretty sure it just has to do with the size of the array we are passing into the script. The generation annual and capacity annual functions run fine
Expected Behavior
visualization.py to execute successfully, as it does with the other scenarios.
Steps To Reproduce
- Remove all countries from the geographic scope
- Run the snakemake workflow
Operating System
Linux
Log output
Traceback (most recent call last):
File "/home/trevorb1/repositories/osemosys_global/workflow/scripts/osemosys_global/visualisation.py", line 328, in <module>
main()
File "/home/trevorb1/repositories/osemosys_global/workflow/scripts/osemosys_global/visualisation.py", line 51, in main
plot_generation_hourly()
File "/home/trevorb1/repositories/osemosys_global/workflow/scripts/osemosys_global/visualisation.py", line 291, in plot_generation_hourly
df = transform_ts(df)
File "/home/trevorb1/repositories/osemosys_global/workflow/scripts/osemosys_global/visualisation.py", line 164, in transform_ts
df = pd.merge(df,
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 122, in merge
return op.get_result()
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/reshape/merge.py", line 725, in get_result
result_data = concatenate_managers(
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/concat.py", line 202, in concatenate_managers
return _concat_managers_axis0(mgrs_indexers, axes, copy)
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/concat.py", line 264, in _concat_managers_axis0
mgrs_indexers = _maybe_reindex_columns_na_proxy(axes, mgrs_indexers)
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/concat.py", line 306, in _maybe_reindex_columns_na_proxy
mgr = mgr.reindex_indexer(
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 692, in reindex_indexer
new_blocks = [
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 693, in <listcomp>
blk.take_nd(
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 1121, in take_nd
new_values = algos.take_nd(
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/array_algos/take.py", line 117, in take_nd
return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
File "/home/trevorb1/anaconda3/envs/osemosys-global/lib/python3.10/site-packages/pandas/core/array_algos/take.py", line 158, in _take_nd_ndarray
out = np.empty(out_shape, dtype=dtype)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 115. GiB for an array with shape (5, 3091789980) and data type object
Anything else?
No response
@trevorb1 Did this ever get solved? If not then I suggest to just add a rule in the visualization script that if the global model is run, or more than x regions are being run, the hourly visuals are not being generated. Maybe a high effort low reward kind of situation since most users will create their own visuals either way.
Never got solved! Yes, that is a fair fix! From a workflow perspective, snakemake should always know what files are being created. Therefore, it may make more sense to create that logic within snakemake, rather than the viz script. I still agree with your general solution, though!