MUSE_OS icon indicating copy to clipboard operation
MUSE_OS copied to clipboard

Memory error for transport model using scipy solver

Open tsmbland opened this issue 1 year ago • 13 comments

I tried using the scipy solver with @sharwaridixit's transport model, and got this rather alarming error:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 40.2 GiB for an array with shape (16, 16, 7, 7, 41, 41, 16, 16) and data type float64

It's specifically being thrown by this line in the calculation of the constraint matrix. I'm guessing there's a bug somewhere that's accidentally duplicating all the dimensions of an array, but I'd need to investigate further

tsmbland avatar Jul 02 '24 08:07 tsmbland

Hi @tsmbland,

Can you let me know what command you invoke in the terminal that produces the error, i.e. is it something like muse --model -<your_model_goes_here>

I take from your description, the culprit is line 853 in src/muse/constraints.py and reads reduced = reduce(xr.DataArray.__mul__, diagonal_submats)

HarmonicReflux avatar Jul 11 '24 09:07 HarmonicReflux

Hi @HarmonicReflux. To reproduce the error you will need to use the model found here. However, in order to fix the bug described in that issue and get it working with the latest version of MUSE, you will have to replace the settings file with the one attached here: settings.zip You can then run the model with muse settings.toml.

The specific issue I'm describing here only happens when you set lpsolver to "scipy" (currently it's using "adhoc").

Let me know if you have any problems. I imagine this one won't be super straightforward to fix.

tsmbland avatar Jul 11 '24 09:07 tsmbland

  • How can one access the source code for the LP solver "adhoc" and where is this solver documented?

  • I think the solver "adhoc" acts fundamentally different from scipy's version of solving a linear program, although there are several solvers available for LP's and settings.toml does not specify which one it uses. In particular there are "highs", "highs-ds", "interior-point" (legacy), "revised simplex" (legacy), and "simplex" (legacy) available.

In the below, I print a detailed error log: lp_solver_scipy My take is the scipy solver simply runs out of memory for the (16, 16, 7, 7, 41, 41, 16, 16) data structure, which has about 5.4 billion entries:.

When we decrease the problem to solve, for example, within settings.toml, shrink

time_framework = [2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045, 2050]
foresight = 5  

to

time_framework = [2010, 2020, 2030, 2040, 2050]
foresight = 10  # Multiple of the smallest difference

we get a numpy.core._exceptions._ArrayMemoryError: Unable to allocate 30.8 GiB for an array with shape (14, 14, 7, 7, 41, 41, 16, 16) and data type float64. This means are now down to 31 GB from the original 40GB (although, we then get a ValueError saying `File "C:\Users\bjs13\python_venvs\muse_os\lib\site-packages\xarray\core\variable.py", line 3001, in calculate_dimensions raise ValueError( ValueError: dimension 'year' already exists as a scalar variable should we chose the "adhoc" solver on the smaller problem.

There are also commercial solvers for LP's such as "Gurobi" and "CPLex" which have free academic licenses, and may be better suited to solve the problem?

HarmonicReflux avatar Jul 15 '24 15:07 HarmonicReflux

On the memory error, what is this massive thing with 3.5 billion entries. Seems ludicrously big for the problem at hand?


From: HarmonicReflux @.> Sent: Monday, July 15, 2024 4:46:56 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Subscribed @.***> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] Memory error for transport model using scipy solver (Issue #389)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

  • How can one access the source code for the LP solver "adhoc" and where is this solver documented?

  • I think the solver "adhoc" acts fundamentally different from scipy's version of solving a linear program, although there are several solvers available https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html for LP's and settings.toml does not specify which one it uses. In particular there are "highs", "highs-ds", "interior-point" (legacy), "revised simplex" (legacy), and "simplex" (legacy) available.

In the below, I print a detailed error log: lp_solver_scipy.jpg (view on web)https://github.com/user-attachments/assets/31fb0d9c-c66a-4e0b-a8ed-0f6d71d8e420 My take is the scipy solver simply runs out of ram for the (16, 16, 7, 7, 41, 41, 16, 16) data structure, which has about 5.4 billion entries:.

When we decrease the problem to solve, for example, within settings.toml, shrink

time_framework = [2010, 2015, 2020, 2025, 2030, 2035, 2040, 2045, 2050] foresight = 5

to

time_framework = [2010, 2020, 2030, 2040, 2050] foresight = 10 # Multiple of the smallest difference

we get a numpy.core._exceptions._ArrayMemoryError: Unable to allocate 30.8 GiB for an array with shape (14, 14, 7, 7, 41, 41, 16, 16) and data type float64. This means are now down to 31 GB from the original 42GB (although, we then get a ValueError saying `File "C:\Users\bjs13\python_venvs\muse_os\lib\site-packages\xarray\core\variable.py", line 3001, in calculate_dimensions raise ValueError( ValueError: dimension 'year' already exists as a scalar variable should we chose the "adhoc" solver on the smaller problem.

There are also commercial solvers for LP's such as "Gurobi" and "CPLex" which have free academic licenses, and may be better suited to solve the problem?

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/389#issuecomment-2228824038, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLLGEGKK4WLAVYEIG53ZMPVHBAVCNFSM6AAAAABKHCCZWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYHAZDIMBTHA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ahawkes avatar Jul 15 '24 16:07 ahawkes

Looks very fishy indeed.

To isolate the issue and compare the behaviour of the optimiser for the output of muse --model default, I stored them in a their separate folder on my local copy of the repository.

Say, I work from the project root, and can run the adhoc optimiser fine with muse my_models/Transporttrialmodel/settings.toml

If I now use your settings.toml file, and apply it to the Results of the default model like so: muse my_models/Defaultmodel/settings.toml

I get the following AssertionError (see also screenshot attached): AssertionError: Projections file does not exist (/home/bjs/Downloads/vcs/MUSE_OS/my_models/Defaultmodel/input/Projections.csv) Screenshot from 2024-07-18 09-05-18

Is there a quick way to adapt the settings.toml file as well as the required input files to make it compliant to the structure of the Results file for the default model?

HarmonicReflux avatar Jul 18 '24 08:07 HarmonicReflux

Another idea I had is to experimentally run the optimizer of scipy for all algorithms supported off-the-shelf, i.e. "highs", which is the default, "highs-ds", "interior-point" (legacy), "revised simplex" (legacy), and "simplex" (legacy)

How can we manipulate the settings.toml file to cater for these different flavours, say, by passing a respective keyword argument onto the solver?

HarmonicReflux avatar Jul 18 '24 08:07 HarmonicReflux

I think anything that says “legacy” here relates to an older version of the model (StarMuse). So I guess only highs and highs-ds are relevant. Not sure if the difference between these.


From: HarmonicReflux @.> Sent: Thursday, July 18, 2024 9:18:09 AM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Hawkes, Adam D @.>; Comment @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] Memory error for transport model using scipy solver (Issue #389)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Another idea I had is to experimentally run the optimizer of scipy for all algorithms supported off-the-shelf, i.e. "highs", which is the default, "highs-ds", "interior-point" (legacy), "revised simplex" (legacy), and "simplex" (legacy)

How can we manipulate the settings.toml file to cater for these different flavours, say, by passing a respective keyword argument onto the solver?

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/389#issuecomment-2235908485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLKADIOVJPFUJTDCIXDZM524DAVCNFSM6AAAAABKHCCZWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVHEYDQNBYGU. You are receiving this because you commented.Message ID: @.***>

ahawkes avatar Jul 18 '24 08:07 ahawkes

I've had a look into this a bit more, and it's not a bug per-se, rather a combinatorial explosion due to the large number of timeslices, technologies and end-use commodities (particularly in the road_vehicles sector).

The particular object facing the memory issue is the constraint matrix for the max_production constraint. For this constraint, A is a square matrix with side length (n_assets * n_technologies * n_commodities * n_timeslices), which for this model is 73472 x 73472 (i.e. very big!)

There are a couple of ways we can address this, either:

Changing the model:

  • The model has 16 timeslices, although none of the parameters/inputs are defined on a per-timeslice basis, so there's no need to have this many timeslices (or even more than one timeslice)

Changing MUSE:

  • The max_production constraint (and probably other constraints) could be compressed over the timeslice dimension if there are no differences between timeslices.
  • Also, this constraint is being defined on a per-commodity basis, but there's no reason to do this as max_production is defined in terms of technology flow not per-commodity (indeed the ratio between the output commodities of a technology should be fixed by definition)

This latter point brings me to a bigger issue. The output of the linear optimisation is a decision vector containing capacity and production data. The production data is defined for every commodity, but there's nothing in place to constrain the ratio of commodity outputs for a given technology. Surely, instead of calculating production of each commodity, we should be calculating the flow of each technology, and then from this we can calculate commodity outputs based on fixed output ratios in a separate step.

I also think this is why we cannot currently use the production data directly from the solver, as described in #423

tsmbland avatar Aug 07 '24 13:08 tsmbland

Thanks Tom, a few thoughts:

  1. Not have decision variables for every commodity for every asset. We only need the decision variables for the commodities the asset actually produces or consumes.
  2. I do think we need commodities here as they will be used elsewhere in the optimization, I think (at least this is what is planned for MUSE 2.0). i.e. we need to know about overall commodity balances on a per-time slice basis.
  3. Could we represent this as a sparse matrix instead? If it’s not sparse, there much be a lot of zeros in there?
  4. I’m not clear why we have *n_technologies here – shouldn’t it be (n_assets + n_technologies)n_commoditiesn_timeslices. Probably missing something there.

Rgds, Adam

From: Tom Bland @.> Sent: Wednesday, August 7, 2024 2:13 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Hawkes, Adam D @.>; Comment @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] Memory error for transport model using scipy solver (Issue #389)

This email from @.@.> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

I've had a look into this a bit more, and it's not a bug per-se, rather a combinatorial explosion due to the large number of timeslices, technologies and end-use commodities (particularly in the road_vehicles sector).

The particular object facing the memory issue is the constraint matrix for the max_production constraint. For this constraint, A is a square matrix with side length (n_assets * n_technologies * n_commodities * n_timeslices), which for this model is 73472 x 73472 (i.e. very big!)

There are a couple of ways we can address this, either:

Changing the model:

  • The model has 16 timeslices, although none of the parameters/inputs are defined on a per-timeslice basis, so there's no need to have this many timeslices (or even more than one timeslice)

Changing MUSE:

  • The max_production constraint (and probably other constraints) could be compressed over the timeslice dimension if there are no differences between timeslices.
  • Also, this constraint is being defined on a per-commodity basis, but there's no reason to do this as max_production is defined in terms of technology flow not per-commodity (indeed the ratio between the output commodities of a technology should be fixed by definition)

This latter point brings me to a bigger issue. The output of the linear optimisation is a decision vector containing capacity and production data. The production data is defined for every commodity, but there's nothing in place to constrain the ratio of commodity outputs for a given technology. Surely, instead of calculating production of each commodity, we should be calculating the flow of each technology, and then from this we can calculate commodity outputs based on fixed output ratios in a separate step.

I also think this is why we cannot currently use the production data directly from the solver, as described in #423https://github.com/EnergySystemsModellingLab/MUSE_OS/discussions/423

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/389#issuecomment-2273442600, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLODM3AAWHRQHOO5US3ZQIMMLAVCNFSM6AAAAABKHCCZWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTGQ2DENRQGA. You are receiving this because you commented.Message ID: @.@.>>

ahawkes avatar Aug 07 '24 13:08 ahawkes

  1. Not have decision variables for every commodity for every asset. We only need the decision variables for the commodities the asset actually produces or consumes.

That's a good idea

  1. I do think we need commodities here as they will be used elsewhere in the optimization, I think (at least this is what is planned for MUSE 2.0). i.e. we need to know about overall commodity balances on a per-time slice basis.

Yes. We wouldn't want to get rid of commodity info completely from the constraints matrices, but since we don't need production output on a per-commodity basis we can still shrink the A matrix.

Taking the demand constraint as an example (which matches production of all commodities to be at least as high as demand), A currently has dimensions (n_assets * n_commodities * n_timeslices) * (n_assets * n_commodities * n_timeslices), b is (n_assets * n_commodities * n_timeslices) * 1, which outputs a decision vector of shape (n_assets * n_commodities * n_timeslices) * 1, which defines production of every commodity by every asset in every timeslice.

I'm pretty sure we can change A to (n_assets * n_commodities * n_timeslices) * (n_assets * n_timeslices), factor the fixed output ratios into this matrix, and you'd end up with a decision vector of shape (n_assets * n_timeslices) * 1, which defines the flow of each asset in each timeslice, rather than the output of each commodity. You'd still be constrianing the output of every commodity to meet the demand (b is still defined on a per-commodity basis), you're just changing the problem to solve the flow of each asset rather than the production of each commodity.

Sort of makes sense in my head, but I could be wrong and might be missing something

  1. Could we represent this as a sparse matrix instead? If it’s not sparse, there much be a lot of zeros in there?

Yes, it does have a lot of zeros. We could represent it as a sparse matrix, but I don't know if the highs solver will work with sparse matrices, and I can't find anything about this in the documentation. Only for the "interior_point" solver which is deprecated https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html#optimize-linprog-interior-point

  1. I’m not clear why we have *n_technologies here – shouldn’t it be (n_assets + n_technologies)n_commoditiesn_timeslices. Probably missing something there.

Actually n_technologies here specifically refers to the search space. I think it's because all assets can be replaced by all technologies in the search space, so you need to define max_production for all assets in every replacement scenario. I think...?

tsmbland avatar Aug 07 '24 14:08 tsmbland

Regarding (2), doesn’t the decision variable vector have to be the same for all constraints? In order to serve all the types of constraints that we need, we need to know the commodity production and consumption of every asset? I see the capacity (and availability) limits production (of PACs in MUSE 2.0) - so the constraint does not need to be on a commodity-by-commodity basis - it needs to be on an asset basis, but we still need the commodity-level decision variables?

Overall there must be much more efficient ways to do this as I'm aware of way way bigger models that are very similar and can be solved on laptops.


From: Tom Bland @.> Sent: Wednesday, August 7, 2024 5:48:53 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Hawkes, Adam D @.>; Comment @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] Memory error for transport model using scipy solver (Issue #389)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

  1. Not have decision variables for every commodity for every asset. We only need the decision variables for the commodities the asset actually produces or consumes.

That's a good idea

  1. I do think we need commodities here as they will be used elsewhere in the optimization, I think (at least this is what is planned for MUSE 2.0). i.e. we need to know about overall commodity balances on a per-time slice basis.

Yes. We wouldn't want to get rid of commodity info completely from the constraints matrices, but since we don't need production output on a per-commodity basis we can still shrink the A matrix.

Taking the demand constraint as an example (which matches production of all commodities to be at least as high as demand), A currently has dimensions (n_assets * n_commodities * n_timeslices) * (n_assets * n_commodities * n_timeslices), b is (n_assets * n_commodities * n_timeslices) * 1, which outputs a decision vector of shape (n_assets * n_commodities * n_timeslices) * 1, which defines production of every commodity by every asset in every timeslice.

I'm pretty sure we can change A to (n_assets * n_commodities * n_timeslices) * (n_assets * n_timeslices), factor the fixed output ratios into this matrix, and you'd end up with a decision vector of shape (n_assets * n_timeslices) * 1, which defines the flow of each asset in each timeslice, rather than the output of each commodity. You'd still be constrianing the output of every commodity to meet the demand (b is still defined on a per-commodity basis), you're just changing the problem to solve the flow of each asset rather than the production of each commodity.

Sort of makes sense in my head, but I could be wrong and might be missing something

  1. Could we represent this as a sparse matrix instead? If it’s not sparse, there much be a lot of zeros in there?

Yes, it does have a lot of zeros. We could represent it as a sparse matrix, but I don't know if the highs solver will work with sparse matrices, and I can't find anything about this in the documentation. Only for the "interior_point" solver which is deprecated https://docs.scipy.org/doc/scipy/reference/optimize.linprog-interior-point.html#optimize-linprog-interior-point

  1. I’m not clear why we have *n_technologies here – shouldn’t it be (n_assets + n_technologies)n_commoditiesn_timeslices. Probably missing something there.

Actually n_technologies here specifically refers to the search space. I think it's because all assets can be replaced by all technologies in the search space, so you need to define max_production for all assets in every replacement scenario. I think...?

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/389#issuecomment-2273658694, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLMSXQMVO2YZUIBM3ODZQIXVLAVCNFSM6AAAAABKHCCZWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTGY2TQNRZGQ. You are receiving this because you commented.Message ID: @.***>

ahawkes avatar Aug 08 '24 10:08 ahawkes

Regarding (2), doesn’t the decision variable vector have to be the same for all constraints?

Yes, but I think what I said above applies to all of the constraints.

In order to serve all the types of constraints that we need, we need to know the commodity production and consumption of every asset?

Yes, but if all input/output commodities are a fixed ratio, then knowing the flow of every asset is enough to know the production and consumption of every commodity.

In MUSE1 we only have fixed outputs. Supposedly we allow flexible inputs, so if you wanted to optimise this in the solver then consumption data probably would be needed on commodity-by-commodity basis, but we're not actually doing that in MUSE1. There's no consumption data in the decision vector. Like production (which is optimised in the solver but thrown away), consumption is calculated afterwards by another function.

tsmbland avatar Aug 08 '24 14:08 tsmbland

OK, am interested to see how this would work. We still need overall commodity balance constraints (of various types?) – production of each commodity must be (at least) equal to its consumption, in each region, time slice, etc. Where you have fixed input/output ratio maybe this is possible with only overall asset flow variables.

From: Tom Bland @.> Sent: Thursday, August 8, 2024 3:53 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Hawkes, Adam D @.>; Comment @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] Memory error for transport model using scipy solver (Issue #389)

This email from @.@.> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Regarding (2), doesn’t the decision variable vector have to be the same for all constraints?

Yes, but I think what I said above applies to all of the constraints.

In order to serve all the types of constraints that we need, we need to know the commodity production and consumption of every asset?

Yes, but if all input/output commodities are a fixed ratio, then knowing the flow of every asset is enough to know the production and consumption of every commodity.

In MUSE1 we only have fixed outputs. Supposedly we allow flexible inputs, so if you wanted to optimise this in the solver then consumption data probably would be needed on commodity-by-commodity basis, but we're not actually doing that in MUSE1. There's no consumption data in the decision vector. Like production (which is optimised in the solver but thrown away), consumption is calculated afterwards by another function.

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/389#issuecomment-2276027525, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLLJ3ZKG5MVD2KWE3ALZQOA3RAVCNFSM6AAAAABKHCCZWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZWGAZDONJSGU. You are receiving this because you commented.Message ID: @.@.>>

ahawkes avatar Aug 09 '24 08:08 ahawkes