subsurface icon indicating copy to clipboard operation
subsurface copied to clipboard

What is this?

Open banesullivan opened this issue 5 years ago • 7 comments

What's the goal of this project?

banesullivan avatar May 09 '19 16:05 banesullivan

So sorry I didn't spot this earlier, Bane... must have landed in the middle of our travel chaos.

I updated the README, hopefully that helps.

kwinkunks avatar Jun 04 '19 07:06 kwinkunks

Ah thanks, @kwinkunks! I posted this issue a while back when the repo was created as a saw a few compelling contributions

Is this perhaps at the start of creating an underlying library for the subsurface OSS stack?

banesullivan avatar Jun 11 '19 21:06 banesullivan

I've been thinking about this and if you really want to build a viable open source subsurface stack you will need:

  • The ability to ingest data from many sources and formats (with built in extendibility for contributors who want to enable support for new input)
  • A consistent internal schema for commonly used data (picks, faults, seismic horizons/faults, grids, well surveys, production histories, completion histories, pressure observations, etc) and some built in validation to 'clean' and log issues with incoming data, handle units and unit conversions, types, etc
  • A core library of functions to integrate data, generate features for analytics/data science stuff, and so forth (these are the whole point of this exercise really, everything else is in support of this)
  • Visualization/Output the ability to write out results of functions and visualize them
  • Around all of these tasks could be support functions for organizing stuff into automated pipelines, handling interactions with cloud services, etc.

Any of the above can (and will when something suitable exists) leverage outside packages where it makes sense too, but each overall "job" needs to be filled to make everything work as a complete 'stack' that can compete with closed source programs. There is ton of room to debate scope and there is nothing wrong with starting spin off packages specializing if one of the above jobs isn't fulfilled by something already existing out there and is deemed out of scope for this project...

nathangeology avatar Sep 14 '19 00:09 nathangeology

I have a fork of the repo so maybe I can frame out a project skeleton using folders and abstract classes as to what I'm thinking here and do a pull request to discuss and revise it.

nathangeology avatar Sep 14 '19 00:09 nathangeology

PyVista can handle a lot of what you mentioned... too busy this weekend to outline exactly how and where but I see a ton of overlap between PyVista and an the underlying framework you describe above

Sent with GitHawk

banesullivan avatar Sep 14 '19 01:09 banesullivan

Hmm... pyvista could certainly handle some of the visualization tasks, but it is almost all visualization focused, and further, it is grid focused (an important type of visualization to be sure, but not the only one we would need for subsurface geoscience). At the core of any geoscience package would be the integration of the subsurface data. As example, look at Petrel. Petrel has really good visualizations and that sells a lot of software to companies, but what keeps people in the industry using it is the data I/O, data processing/integration, interpretation workflows, etc In short, I'm not proposing a new visualization package here, I'm proposing an open source workflow to knit the many packages out there together and maybe provide some functions of our own. It could be petrophysics functions, geophysics functions, geomodelling functions, reservoir engineering functions, etc etc. I seem to recall that Matt really wanted to move beyond simple hackathon code slapped together in 1.5 days to building something longer term as a community that could move open source into real contention against commercial packages (at least in some areas) as we've seen in other industries outside of oil/gas/geosciences. If we had a framework for this, hackathon teams, could actually make a lasting impact on the community with their projects by contributing new functions and workflows via pull requests and later competitions could push things forward by not having to start from scratch on a generic data/science analytics base every time (segyio and lasio are great examples of giving projects a quick leg up on I/O side of things). From an architecture standpoint, input/output and visualization are 'details' that support higher level workflows and goals in our software. Needs and requirements of these details could change quickly and on a last minute basis in a real-world workflow. What is the core of the software? The core is the workflow itself: What data needs to be integrated and what questions do I need to answer with the data? Examples could be: What intervals in the subsurface have produced the model fluid volume? What is the estimated pressure at a given location and time in the subsurface? What is the estimated permeability of a well with no well logs in a given zone? What was my estimated original in-place volume of hydrocarbons (and the associated uncertainty)? Can incorporate... (core photos, mud logs, siesmic, production, pressure, etc) into my data set? I could go on, but for machine learning/statistics modelling of problems you need to be able to make feature tables based on questions like the above. In this regard visualization is a lot like the choice of ML algorithm, it is important to have a solid library of them, but you won't really know what you need until you need it and it isn't core to the workflow and can/will change frequently when working with datasets and problems.

nathangeology avatar Oct 14 '19 15:10 nathangeology

The TLDR; PyVista isn't just 3D viz - it's all about managing spatial data structures and we can leverage that. Also, we shouldn't attempt to create a complete software solution but rather, more of a set of standards for managing, tracking, and exchanging spatial data.


pyvista could certainly handle some of the visualization tasks, but it is almost all visualization focused,

Well, not exactly. PyVista is a mesh analysis library at its core (wrapping VTK). The focus is on mesh and grid data structures. The 3D viz is one of PyVista's biggest focuses, but it is driven by the explicit mesh types and analysis/filtering of those mesh/grid types.

and further, it is grid focused (an important type of visualization to be sure, but not the only one we would need for subsurface geoscience

What exactly do you mean by "grid-focused" PyVista can handle irregular, Tree, rectilinear, 1D,2D, and 3D grids/meshes, and pretty much any type of spatially referenced data. PyVista is able to handle a majority of the spatial data structures we encounter in subsurface science (though I could have creator's bias here... 😁 as things like cylindrical meshes are very tricky to deal with)

... but what keeps people in the industry using it is the data I/O, data processing/integration, interpretation workflows, etc

In short, I'm not proposing a new visualization package here, I'm proposing an open-source workflow coupling tool to tie together the many packages out there together and maybe provide some functions of our own - less of a complete software solution and more of a set of standards for managing, tracking, and exchanging spatial data.

I am absolutely on board here! I believe that some sort of library that manages spatially referenced datasets should be at the core of this which provides a close link to visualization packages (like PyVista), file IO packages, and domain-specific software like GemPy, SimPEG, etc. While I'm not exactly sure what that main package would look like at the moment, I'm thinking PyVista has a lot to offer going beyond just 3D viz and to management of spatially referenced data structures and the ability to fuse datasets. Also, this library at the core should only link together all the other tools.

I think this library needs to be closely coupled to a visualization tool, otherwise it can be tough to work with 3D data.


There are a lot of excellent points in your post above but I think the gist is that we need some sort of framework that can manage geoscientific research workflows in a reproducible fashion.

I also think it would be cool to have some sort of tracking enabled for these types of datasets. For example, you might start with a set of sparsely measured points in your database (and the database assigns a unique identifier to it) then use that sparse data for inversion and the resulting inverted model can point to the source data by the unique identifier. To do this, the external libraries would have to make sure they implement this data tracking.

banesullivan avatar Oct 14 '19 16:10 banesullivan