core Construct a distributed parallel PUMI mesh from own file format

Construct a distributed parallel PUMI mesh from own file format

Open a-jp opened this issue 4 years ago • 22 comments

I am very new to PUMI. I am working with mixed cell type unstructured meshes, and using a file format not supported by PUMI. I was told via email that it would be ok to use the issue tracker to ask questions.

I have a mesh format that I can interrogate in parallel from disk, but that format is not directly supported by PUMI. I read 1/nprocs worth of cells on each processor and the associated vertex information for those cells. I have so far created an apf::Mesh2 object by calling apf::buildElement for each cell of a mixed cell-type unstructured mesh that is read on each processor. At the end of this process, each processor has a simply partitioned view of the full mesh (approximately 1/nprocs worth). However, these meshes are completely disconnected as far as PUMI is concerned. I want to reconnect them in parallel. I then want to parallel re-partition the mesh using parmetis to give a better partitioning. My mesh is stored on disk in a single file with global numbers, not in multiple part meshes. I also have no geom definition, just the discrete mesh.

Basically my work flow is: simply partition my mesh on disk by reading in parallel chunks of that mesh onto nprocs (done this); build a mesh that PUMI understands from my own file format (done this and built an apf mesh2); build a parallel view of the distributed mesh by connecting it back together (don’t know how to do this); parallel re-partition again this time properly using parmetis etc via PUMI (don’t know how to do this); and then build ghost/halos cells for that distributed mesh on physical and partition boundaries (don’t know how to do this). From there I’d like to associate scalars and vectors to the vertices, faces, and cells and run my solver. I would then like to mark the vertices, faces and cells into groups appropriate for my simulation such as groupings of cells and boundary conditions.

I would really appreciate if someone could give me a steer as to how to go about my next step which is stitching the mesh together in parallel? I ideally do not want to approach this by writing out part meshes, or indeed having a single processor read the whole mesh and then distribute it. So the above work flow is an attempt to have a parallel process from the start at the cost of partitioning it twice: once simply, and then re-partitioning it properly.

Can I achieve what I’m trying to do within PUMI, or a variant of the above? If I can, I’d really appreciate some help. I appreciate it would be best to approach this if my meshes were in a file format that can be read by PUMI but I am not in that position hence the above work flow.

Thanks, Andy

Aug 29 '19 12:08 a-jp

Hello Andy,

What you described can be done with PUMI.

I'm pasting some points from the emails 'for the record'; the questions are good ones and may be helpful to others in the future.

Constructing a Parallel Mesh

`construct(...)` API

I'd recommend using the 'construct' API to manage the serial to parallel mesh construction.
https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/apf/apfConvert.h#L34-L49

It reads in mesh connectivity information via arrays in serial, or parallel, to create a partitioned pumi mesh. If your current reader has the logic to read 1/N elements from the serial file then you are well on your way to getting this working. A few examples of its use are below:

https://github.com/SCOREC/core/blob/e2133b12086d4503b30b2eedc6ca8595fa52a514/test/construct.cc#L35

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/icesheet.cc#L502

Manual Definition of Remote Copies

The alternative is to manually define the connections between shared mesh entities. A 'shared' entity in an element based mesh partition is an entity that exists on multiple processes to form the closure of mesh elements assigned to those processes. Each shared entity in an APF/PUMI mesh knows about the other copies of that entity to facilitate communications.

The following code from construct(...) API implementation does this:

https://github.com/SCOREC/core/blob/d9eeb2b028afd6182b4d3bc1063d3aae3a348760/apf/apfConstruct.cc#L152-L155

Ghosts

The following code creates ghost entities:

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/pumi.cc#L786-L788

Note, it is using an alternative, higher level, API with the PUMI prefix. I'm not as familiar with moving back and forth between PUMI_ objects and apf objects. When we get to this point of implementation @seegyoung may be able to point us at a example of the conversion syntax.

Partitioning

There are a few partitioning utilities provided with core:

split - recursive inertial bisection zsplit - multi-level graph via zoltan/parmetis ptnParma - combines partitioning with load balancing (partition improvment) https://github.com/SCOREC/core/wiki/Partitioning

Some useful discussion of these tools is in the following issue starting here: https://github.com/SCOREC/core/issues/241#issuecomment-512754961

Associating Data with Mesh Entities

Fields

The field apis (e.g., 'createField' that you found) provide additional functionality. This is an area of code that I'm less familiar with. I'd suggest creating an issue to ask field related questions. Below are a few utilities/examples that use fields.

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/fieldReduce.cc

creates a field with one double per mesh vertex then exercises functions that accumulate values on shared mesh vertices, and max/min reductions on the shared vertices:

https://github.com/SCOREC/core/blob/fe28bfd423a4e82dad073acb6a7432d7bdaf66b1/test/field_io.cc

creates a field with one vector (3 doubles) per mesh vertex, writes it to disk with the mesh, then reads the mesh back and gets the handle/pointer to the existing field

Aug 29 '19 12:08 cwsmith

Thank you! Really helpful. Sorry I didn't add tags I don't know how to do that. I fell down in trying to use the construct API because it appears to require in the 4th argument a constant element type.

apf::construct(mesh, m.elements, m.numElms, m.elementType, outMap);

But I have a mixed element mesh and so I don't understand how to use this for the case that elementType is a vector in my case with an a type per cell. So I went for constructing each element by hand using buildElement. What am I missing? How would I use this for mixed cell type meshes?

Thanks, Andy

Aug 29 '19 12:08 a-jp

Ahhh - good catch. Sorry for missing that limitation of construct(..).

We should be able to modify the current version to work with multiple entity types by extending the arguments to be an array connectivity arrays for each user specified element type (or something like that). I'm sure there are some complexities lurking beneath that will make it non-trivial.

I'm spending most of my time in the next 1.5 weeks preparing an SC workshop submission. After that I can put time into the extension. Likewise, we'd be happy to accept a code contribution, and help with it, if you were interested in working on it before I can.

Aug 29 '19 13:08 cwsmith

Ok, I can take a look. To help in that, can I ask, is there an equivalent pumi::construct? I haven't quite figured out whether I should be making an apf::Mesh2 or a pumi_mesh? Or indeed if there is a difference? So should I be using pumi_mesh_createElem rather than the apf api apf::buildElement?

I ask because I think where I want to be is to interface with PUMI and so developing code for an apf interface via modifying the construct functions would need to still allow me to get a pumi_mesh object eventually (in memory as a distributed parallel mesh).

Not sure if that makes total sense, I guess I'm saying that I don't really want to go down the route of getting all this working for my bespoke mesh format via the apf interface only to find out that I should have done everything via pumi_*. So I'm kind of trying to check that now without knowing quite how to ask the question...

Aug 30 '19 10:08 a-jp

The pumi_ functions are a lightweight wrapper around the APF (mesh and fields) and GMI (geometric model) libraries and provide a more consistent/unified set of APIs; most of the interface is defined in pumi/pumi.h. A few additional functionality have been implemented within this interface; i.e.; they exist above the underlying APF, GMI libraries. One example is ghosting.

I'll work with @seegyoung to nail down the details on constructing/satisfying the pumi object:

https://github.com/SCOREC/core/blob/d9eeb2b028afd6182b4d3bc1063d3aae3a348760/pumi/pumi.h#L84-L102

from an existing apf::mesh and provide a hello world for calling some ghosting functions.

Aug 30 '19 11:08 cwsmith

@a-jp Did you work out the construction of the pumi instance?

Sep 12 '19 14:09 cwsmith

Sort of. I found out that pMesh is just a typedef to apf::Mesh2 inside pumi.h. Since I've got a Mesh2 object I just blindly started using the pumi_* api, but I've run into problems. In parallel, on two processors, if I use the following code on a simple 2D, 4 quad mesh, I get a segfault on the ghost construction line:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

This is holding me up as I can't seem to make ghosts and so I'm kind of assuming I've incorrectly built the pMesh object. On the positive side, the previous three lines run, so it can't be that far off.... Any thoughts?

Sep 12 '19 14:09 a-jp

Hi, Can you try without "pumi_mesh_createFullAdjacency(mesh);"?

After pumi_mesh_createFullAdjacency(mesh), any modification to mesh such as adaptation, migration and ghosting is not allowed as it will break the "full adjacency",

In most cases, a full adjacency is not necessary as PUMI supports "complete representation". I added that function for my internal testing for openmp.

Sep 12 '19 14:09 seegyoung

Nah, that didn't work...

Sep 12 '19 14:09 a-jp

I was wondering if there was something wrong with my call stack. As I said, I'm running on two processors, in 2D, with a 4 quad mesh, 2 cells on each processor. I make the mesh and push the elements in and set the vertex positions. After that I do in this order:

apf::alignMdsRemotes(mesh);
apf::deriveMdsModel(mesh);
mesh->acceptChanges();
apf::verify(mesh, true);

then:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
//pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

I've kind of cribbed that together from examples, could it be I'm just missing some other assemble call that is breaking the ghosting?

Cheers, Andy

Sep 12 '19 15:09 a-jp

Please do not create a partitioned mesh from the scratch. For a partitioned mesh, there is a partition model between geometric model and mesh and unless the partition model and all the remote copy links are valid, the mesh won't work. Please understand that pumi_mesh_verify cannot catch all falsehood of the mesh especially if partitioned.

If you want partitioned mesh of your own, please construct a serial mesh and then run a partitioning program provided in test/*split.cc.

Sep 12 '19 16:09 seegyoung

@seegyoung I think the point of his question is that he already has a partitioned mesh stored in some non-pumi file format, so @a-jp is interested in loading in his externally partitioned mesh in parallel to the pumi data structures.

Sep 12 '19 16:09 jacobmerson

To the best of my knowledge, officially loading a parallel mesh is via .smb format only.

Sep 12 '19 16:09 seegyoung

@seegyoung we have the construct API for constructing parallel meshes from node and element data: https://github.com/SCOREC/core/issues/245#issuecomment-526165443 I believe this is what @a-jp is doing for his larger/real meshes.

Sep 12 '19 16:09 cwsmith

That's correct, I'm using the construct API. Could I ask for thoughts on if there was something wrong with my call stack? As I said, I'm running on two processors, in 2D, with a 4 quad mesh, 2 cells on each processor. I make the mesh and push the elements in and set the vertex positions: all via the construct API, in parallel. After that I do in this order:

apf::alignMdsRemotes(mesh);
apf::deriveMdsModel(mesh);
mesh->acceptChanges();
apf::verify(mesh, true);

then:

pumi_mesh_print(mesh, false); // true segfaults
pumi_mesh_verify(mesh, true);
//pumi_mesh_createFullAdjacency(mesh);
pumi_ghost_createLayer(mesh, mesh->getDimension() - 1, mesh->getDimension(), 1, 1);

I've kind of cribbed that together from examples, could it be I'm just missing some other call that is breaking the ghosting when the last function is called?

Sep 12 '19 17:09 a-jp

It turns out we don't currently have a supported way to convert from an apf::Mesh to a PUMI instance. The 'safest' way forward would be to write your partitioned smb mesh (created by construct(...)) to file/disk then use PUMI APIs to read it: https://github.com/SCOREC/core/blob/d9eeb2b028afd6182b4d3bc1063d3aae3a348760/pumi/pumi.h#L205-L206 and create the ghosts.

Sep 13 '19 14:09 cwsmith

It turns out we don't currently have a supported way to convert from an apf::Mesh to a PUMI instance

That's a massive shame, especially given how much time I've thrown at this. What is required for this to work? There is I assume no construct for the PUMI api? Is that what's missing? I'm really not a fan of writing partitioned part files, especially for large meshes, or when restarting on different numbers of processors that had been used for the previous run, it gets super messy quickly, that's my personal experience at least.

I've already developed the construct API to allow mixed cell types and that seemed to be working at least everything seemed to be in the right place when visualised. It was the ghost creation that caused the problem. I would be happy to try to pitch in, if I could, can you tell me what's missing to go from apf->pumi at the technical level? I find it odd as I said since the pMesh is just a typedef for a Mesh2, is this a missing ancillary data structure?

Sep 13 '19 14:09 a-jp

@a-jp Take a look at the apfToPumi branch and the extended construct example: https://github.com/SCOREC/core/blob/551d71589bc48af0dc5d348751bda95f782ee51c/test/constructThenGhost.cc This example creates a distributed apf mesh with construct(...), creates a pumi instance, creates pumi ghosts, then synchronizes an element field. The image below shows the field before and after synchronization:

ghostSync

This has not been fully flushed out and is still under review on our side. If you want to push forward before we finish, feel free to try it out.

Sep 18 '19 14:09 cwsmith

@cwsmith thanks! That's really interesting. Could you say what the new function is that does the conversion that was missing? Is it?

I'd be really happy to test it for you, as you know I had a working example that failed, which might now work. Can I ask, is there anything else that is required to be done by me prior to calling the new function?

So I know what to look out for: what are you still testing/working on? What's expected that may not work?

Thanks, Andy

Sep 19 '19 14:09 a-jp

The new pumi_mesh_load(...) function you linked to simply sets the pumi instance's pointer for the mesh object: https://github.com/SCOREC/core/commit/7e80f949520b4614fbeadb333899c2b04908acf7#diff-01216a9e3ca94548cc616b1b3fb76333R232-R238

I am somewhat concerned that this is too simple so I hesitate to say that this is complete. I'm really not sure what may be broken/missing without staring at the code some more.

Once your apf mesh is distributed/partitioned to N ranks, where N is the size of MPI _COMM_WORLD, then call pumi_mesh_load(...) and pass in your apf mesh. After that, try to only use the pumi_* APIs. If you can load your test mesh in and do the same beforeSync afterSync sanity check with ParaView (as done in constructThenGhost) that would be a good first step.

Sep 19 '19 15:09 cwsmith

Ok I'll give it a go...

Sep 19 '19 15:09 a-jp

The apfToPumi branch was merged into develop. master will be synced with develop tonight if the nightly tests pass.

https://github.com/SCOREC/core/commit/812a6a196f748735bfee348dd47cdb895cb71fc2

Oct 30 '19 15:10 cwsmith

core
core copied to clipboard

Construct a distributed parallel PUMI mesh from own file format

Constructing a Parallel Mesh

`construct(...)` API

Manual Definition of Remote Copies

Ghosts

Partitioning

Associating Data with Mesh Entities

Tags

Fields

core core copied to clipboard

Construct a distributed parallel PUMI mesh from own file format

Constructing a Parallel Mesh

construct(...) API

Manual Definition of Remote Copies

Ghosts

Partitioning

Associating Data with Mesh Entities

Tags

Fields

core
core copied to clipboard

`construct(...)` API