TulipaEnergyModel.jl icon indicating copy to clipboard operation
TulipaEnergyModel.jl copied to clipboard

Build the input workflow

Open clizbe opened this issue 1 year ago • 20 comments

Build the basic input workflow from raw data to the model.

See discussion #288

Considerations

  • [x] TulipaEnergy/TulipaIO.jl#1
  • [x] What other data sources need to be supported?
  • [x] How to merge data from different sources?
  • [ ] How to handle different units in source data? -> waiting for UnitsJuMP.jl
  • [x] Do we need a local data store?
  • [ ] TulipaEnergy/TulipaEnergyModel.jl#415
  • [ ] TulipaEnergy/TulipaEnergyModel.jl#414
  • [ ] Specify solver specifications
    • [x] common options
    • [ ] options unique to specific solvers
  • [x] TulipaEnergy/TulipaEnergyModel.jl#295

Meta considerations

  • [ ] Do we need parallel execution of pipelines?
  • [ ] Maybe supporting parallel jobs with shared inputs is sufficient

Capabilities/Usability requirements

  • [ ] able to visualise and inspect input/intermediate datasets
  • [ ] Scenario building
    • [ ] change (enable/disable/limit) capacities
    • [ ] easily specify scenario parameters for multiple scenarios
      • need examples
      • probably needs some kind of filter-and-apply API
      • should be code, not config
  • [ ] run model unattended on a server/cluster
  • [ ] ability to compare & inspect multiple runs (e.g. different scenarios)

Related Issues

  • [x] TulipaEnergy/TulipaEnergyModel.jl#89

WHAT WE WANT Build the network once (in a while) Use draft networks to build new networks Sufficient flexibility for ad-hoc code for experimentation Definition of temporal stuff Definition of scenarios (what is included here?) Scope: just model or parts of pipeline (which parts?) Definition of solver specifications Be able to mix data sources (ESDL + ENTSO-E for example) Self-hosted Tulipa database (in case sources change/vanish, & reduce re-pulling/processing data) Export ESDL to simplified representation that is compatible with Tulipa

clizbe avatar Nov 22 '23 10:11 clizbe

Does this includes the representative periods and the assets and flows partitions, or is it just for the data sources?

abelsiqueira avatar Nov 23 '23 09:11 abelsiqueira

The representative period comes from an algorithm, so that should be included, but optionally. A scenario might not require the algorithm, and use fixed periods instead, or the case where the algorithm has run once, and the input hasn't changed, then it need not run again.

As for the flow partitions, aren't they derivable from the profiles? If so, then that would also be along the lines of "compute if input changes".

suvayu avatar Nov 23 '23 11:11 suvayu

@Lokkij I tagged you on this one too if you're interested. You're of course our source for ESDL knowledge but I thought you might also be interested in this stuff. :)

clizbe avatar Nov 23 '23 12:11 clizbe

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

@clizbe I'm guessing you left that comment? Best to discuss in the thread instead of the editing top post.

I see that my wording is pretty unclear. AFAICT, there are two levels of filtering; the top-level includes stuff that are not in Tulipa because of fundamental modelling choices, e.g. no connections. So maybe then having the Port attributes in ESDL will never make sense. And the next level is any other finer choices that we make, which evolves with time.

In this case I mean the top-level fundamental choices. But maybe I'm over thinking it, and doing everything in one go is simpler.

suvayu avatar Nov 23 '23 15:11 suvayu

Yes I think some of it will be specifying the type of ESDL file that Tulipa accepts - which variables should be filled, etc. And then probably a step of converting that ESDL into the form that Tulipa likes, which will include throwing out anything else and maybe some conversion trickery. I would prefer if the ESDL file looks normal before conversion and that we don't build really weird ESDLs - but we'll see what works.

clizbe avatar Nov 28 '23 08:11 clizbe

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

Usually the approach here is to leave attributes in ESDL and simply not read them from the model if you don't need them. In our case, I would keep the filtering as close to Tulipa as possible. That will likely make it easier to write back results to ESDL while keeping the original attributes intact.

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

Lokkij avatar Nov 28 '23 09:11 Lokkij

On Tue, 28 Nov, 2023, 10:49 Wester Coenraads, @.***> wrote:

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

As my understanding goes, for larger datasets we will have to connect to influxdb (or similar) and download for Tulipa to read. There will also be intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

-- Suvayu

suvayu avatar Nov 28 '23 09:11 suvayu

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed? image

clizbe avatar Nov 28 '23 14:11 clizbe

As my understanding goes, for larger datasets we will have to connect to influxdb (or similar) and download for Tulipa to read. There will also be intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

To me this looks like a class diagram, very similar to the diagrams for ESDL. The ESDL documentation has diagrams for all classes, for example: https://energytransition.github.io/#router/doc-content/687474703a2f2f7777772e746e6f2e6e6c2f6573646c/PowerPlant.html

Lokkij avatar Nov 28 '23 16:11 Lokkij

@datejada @gnawin @clizbe Add some use-cases of how you're going to use the model and what your workflow is so they have a better idea of what we need. "I want to run the model from the train" is valid. :)

clizbe avatar Nov 29 '23 13:11 clizbe

Use Cases I would like to be able to:

  • summarize/visualize my input data (in tables or graphs), such as total wind capacity, transport line capacities, available technologies.
  • make transport capacities in certain areas unlimited, while still constraining others.
  • set up multiple scenarios to run in parallel or (otherwise) series - set and forget.
  • visualize output data from one scenario, as well as compare multiple scenarios.
  • keep track of what model version and what data was used for a particular run/analysis - reproducibility.
  • easily specify scenario parameters for multiple scenarios.
  • occasionally add new data / data sources.
  • specify which data sources to use to build a scenario.
  • run the model somewhere that I can go about other work while it runs.
  • know when the model is finished running.

My current workflow for running scenarios is:

  • Duplicate a "default" Access dataset - this has everything needed to do a run.
  • In Excel, process scenario-unique (new) data, so it works with the model.
  • In Access, filter for and delete any data that will be replaced by the new data.
  • Copy and paste the new data into the dataset.
  • Go into the model, Browse for the dataset, Load it, Run the model.
  • Check frequently if the model has finished running.
  • Export data to Excel to make graphs (although Wester is building a UI to make this nicer).

Pros/Cons of Access

  • Can easily see data (once you know where it is)
  • Easy to learn how to edit
  • Takes a long time to edit
  • Sometimes you don't know where the data is
  • Huge tables make it slow even loading/filtering

clizbe avatar Nov 30 '23 16:11 clizbe

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

I guess that's pretty small. However I would really like to support a workflow that doesn't necessitate you to be online. But if people say there's no such need, we can drop it.

Edit: more I think about it, I think we need it, e.g. for running different scenarios it makes no sense to download the same data repeatedly even if it is small. So the question is, should the local store also be accessible to normal users for inspection and analysis. And based on @clizbe's points, I think it should be.

suvayu avatar Dec 01 '23 11:12 suvayu

Pros/Cons of Access

  • Can easily see data (once you know where it is)
  • Easy to learn how to edit
  • Takes a long time to edit
  • Sometimes you don't know where the data is
  • Huge tables make it slow even loading/filtering

@clizbe Do you know SQL? Is it fair to expect someone who is doing analysis to know/learn a bit of SQL?

suvayu avatar Dec 01 '23 11:12 suvayu

@suvayu Sorry I don't know if I responded in person. Learning SQL is totally feasible. I don't think our current modellers know it. (I've used it once.)

clizbe avatar Jan 15 '24 16:01 clizbe

Compiling the model takes a lot of time (Julia thing) with future runs going faster. How are we dealing with this in the workflow? Is the stable version of Tulipa something that compiles once and then can take any data through it? Or will the scenario define a model that needs precompiling before doing multiple runs?

clizbe avatar Feb 06 '24 13:02 clizbe

I think this request needs to be separated according to use case. For example, if you changed an input dataset, naively, you have to rerun. However if you say "I'm doing a sensitivity study, and my changes are only limited to X" then theoretically the repetitions need not start from scratch. But I think that's a very advanced feature which requires deep technical research. AFAIU, this is in @g-moralesespana and @datejada's wishlist (GUSS in GAMS). But there could be simpler use cases between these two extremes.

That said, I'm not sure whether this would fall under the purview or pipeline/workflow or model building. My hunch is, it'll depend on the use case.

I hope that makes sense :⁠-⁠P

suvayu avatar Feb 06 '24 17:02 suvayu

Yeah I figured I'd comment here in case it's a simple answer, but it's probably a bigger discussion.

This is becoming an issue with Spine, so it's good to think about it early.

clizbe avatar Feb 07 '24 11:02 clizbe

For the ENTSOE data base I found this, but I'm not sure if we have access (or if we could have)...it might be interesting to explore it...

https://www.linkedin.com/posts/activity-7140005469414133760-f4XH/?utm_source=share&utm_medium=member_desktop

datejada avatar Apr 28 '24 12:04 datejada

@nope82 commented the following about ENTSO-E:

From just a quick check it seems that this PEMMDB is only accessed by TSOs (Author’s comment: “Sadly no, (data transparency) it is only for sharing between TSO members”. When looking for access to the data, only found a reglament from the EERA study from ACER asking for the PEMMDB data :

“On 23 November 2021, ACER requested ENTSO-E to provide all input data for the ERAA 2021. On 2 December 2021, ENTSO-E provided ACER with access to the pan-European market modelling database (PEMMDB) and the assumptions for the economic viability assessment (EVA)”.

So it seems that ENTSO-E would be the only one that could give access to it, and also seems to be one-time thing access for specific data (or need to do recurrent request access) instead of a completely open access to the data probably

datejada avatar Apr 29 '24 09:04 datejada

@clizbe Reorganize the info here and close this issue

clizbe avatar Jul 29 '24 15:07 clizbe

Stale issue - ongoing efforts moved to other places (links provided)

clizbe avatar Sep 19 '24 14:09 clizbe