meshio icon indicating copy to clipboard operation
meshio copied to clipboard

Unification of point / cell sets among formats

Open tianyikillua opened this issue 5 years ago • 33 comments

Objectives

Unify the names (gmsh:physical, medit:ref, flac3d:zone...) and implementations of point / cell sets among formats.

Names

Currently these different names need to be hard-coded when the subsets are transferred among formats, like

if key in mesh.cell_data and "medit:ref" in mesh.cell_data[key]:
    labels = mesh.cell_data[key]["medit:ref"]
elif key in mesh.cell_data and "gmsh:physical" in mesh.cell_data[key]:
    # Translating gmsh data to medit is an important case, so treat it
    # explicitly here.
    labels = mesh.cell_data[key]["gmsh:physical"]
elif key in mesh.cell_data and "flac3d:zone" in mesh.cell_data[key]:
    labels = mesh.cell_data[key]["flac3d:zone"]
else:
    labels = numpy.ones(len(data), dtype=int)

Two solutions

  1. Either provide a list of these names like
CELL_SETS_NAMES = [
    "gmsh:physical", "medit:ref", "flac3d:zone", ...
]
  1. Adopt a unified name and object for point / cell sets, like
mesh.point_sets = ...
mesh.cell_sets = ...

Implementations

An example of mesh composed of 5 line elements and 6 nodes. We have two point sets: "Red" and "Green", and two cell sets: "Orange" and "Blue".

subsets

  1. Use a point_data and cell_data as indicator functions. Using the current terminology for MED, we have
mesh.point_data["point_tags"] = [0, 1, 1, 2, 1, 0]
mesh.point_tags = {
    1: ["Red"],
    2: ["Green"],
}

mesh.cell_data["line"]["cell_tags"] = [-1, -2, -1, 0, -2]
mesh.cell_tags = {
    -1: ["Orange"],
    -2: ["Blue"],
}

Note that with this implementation, we need two lists: the indicator function list that assigns a unique value to each point / cell, as well as a list that indicates the actual names corresponding to these unique values. This implementation is suited when we want to visualize point / cell sets in ParaView, since they are modeled as a point_data / cell_data.

This implementation is currently adopted by (to be completed)

  • gmsh
  • medit
  • med
  • flac3d
  1. Explicit enumerations of points / cell id's
mesh.point_sets = {
    "Red": [1, 2, 4],
    "Green": [3],
}
mesh.cell_sets = {
    "Orange": {"line": [0, 2]},
    "Blue": {"line": [1, 4]},
}

This implementation uses less memory and is more explicit. However it needs to be converted to the 1st method when visualizing them under ParaView. A universal conversion function can be provided in the Mesh class.

This method is already sketched (not yet exposed) in various formats

  • permas
  • abaqus

Question: can this 2nd method adopted universally for all mesh formats, considering their own specificities?

tianyikillua avatar Jan 05 '20 09:01 tianyikillua

Thank you for launching this. I think it is the most pressing issue. I'll close #290 in favour of this.

I think 'gmsh' needs to be split into MSH2 and MSH4.1.

Another format that I'm actively working on so that it will support #290 is XDMF.

gdmcbain avatar Jan 05 '20 11:01 gdmcbain

Question: can this 2nd method adopted universally for all mesh formats, considering their own specificities?

I think the answer to this is pretty much negative without a lot of cleverness? I envisage internally a Union of two representations, one isomorphic to MEDIT or MSH2, the other fulfilling #290. Conversion between the two, imperfect, as discussed previously, but independent of format as far as possible.

gdmcbain avatar Jan 05 '20 11:01 gdmcbain

I'm hoping that fixing this will clarify and resolve limitations in reading and writing of MSH4.1 such as #524 #536 #550 #614.

gdmcbain avatar Jan 06 '20 01:01 gdmcbain

I think the answer to this is pretty much negative without a lot of cleverness?

Maybe I need a quick catch up here, but what are the reasons why we can not represent sets in MSH4 by

mesh.point_sets = {
    "Red": [1, 2, 4],
    "Green": [3],
}
mesh.cell_sets = {
    "Orange": {"line": [0, 2], "quad": [1, 10, 11]},
    "Blue": {"line": [1, 4]},
}

tianyikillua avatar Jan 06 '20 20:01 tianyikillua

None, I think, except historical. This looks better, a better reflection of the Gmsh MSH4.1 intent, capable of fixing the outstanding issues. I think you're on the right track.

Oh, do you mean why not dump the cell_data cells->tags approach and use this cels_sets tags->sets of cells for everything? The former is a more obvious and direct reflection of the 'ref' in MEDIT and 'physical entity' in MSH2. As discussed, one can more or less convert between the two representations, possibly lossily unless with additional bookkeeping, so I think that if one were only working in formats like MEDIT and MSH2 that one wouldn't want to convert to a set-based tagging and back again.

gdmcbain avatar Jan 06 '20 20:01 gdmcbain

Thanks the for the nice write-up.

Small remark: The original example doesn't contain the case where one point/cell belongs to multiple sets (e.g., "iron" and "boundary").

nschloe avatar Jan 06 '20 20:01 nschloe

For points / cells that are "tagged" with multiple sets,

  1. The 1st point_data / cell_data method requires creating intersections of sets (as MED does)
  2. The 2nd "explicit enumerations of points / cell id's" method naturally covers the possibilities of having several sets that contain the same point / cell.

tianyikillua avatar Jan 06 '20 20:01 tianyikillua

Let me add that, coming from a PDE background, the material would probably be implemented as a function. As opposed to tags, the limitation here is that every point/cell has exactly one value.

I like this representation

mesh.cell_sets = {
    "Orange": {"line": [0, 2]},
    "Blue": {"line": [1, 4]},
}

for sets.

nschloe avatar Jan 06 '20 20:01 nschloe

Note also that gmsh calls "tag" what we'd rather refer to as a function, i.e.,

mesh.cell_data["line"]["cell_tags"] = [-1, -2, -1, 0, -2]

You probably realize that already.

nschloe avatar Jan 06 '20 20:01 nschloe

That's MSH2, not MSH4.

gdmcbain avatar Jan 06 '20 22:01 gdmcbain

In MSH2, each cell was assigned exactly one ‘physical entity’. If one defined two physical entities that both contained a particular cell, that cell was actually duplicated so that each only had to belong to a single physical entity. (Remember the comment 2018-02-05 from #175.)

But MSH4 did away with this. If I take the same Gmsh GEO snippet from that comment

Point (1) = {0, 0, 0, 1};
Point (2) = {1, 0, 0, 1};
Point (3) = {2, 0, 0, 1};

Line (1) = {1, 2};
Line (2) = {2, 3};

Physical Line("left", 1) = {1};
Physical Line("all", 2) = {1, 2};

and run it today through Gmsh 4.5.0-git-0c1b63c8d with

gmsh -1 twoseg.geo

I get

$MeshFormat
4.1 0 8
$EndMeshFormat
$PhysicalNames
2
1 1 "left"
1 2 "all"
$EndPhysicalNames
$Entities
3 2 0 0
1 0 0 0 0 
2 1 0 0 0 
3 2 0 0 0 
1 0 0 0 1 0 0 2 1 2 2 1 -2 
2 1 0 0 2 0 0 1 2 2 2 -3 
$EndEntities
$Nodes
5 3 1 3
0 1 0 1
1
0 0 0
0 2 0 1
2
1 0 0
0 3 0 1
3
2 0 0
1 1 0 0
1 2 0 0
$EndNodes
$Elements
2 2 1 2
1 1 1 1
1 1 2 
1 2 1 1
2 2 3 
$EndElements

This MSH4.1 is quite different to the MSH2.2 generated by Gmsh two years ago. (I can get the same today with -format msh2.) Note that the MSH4.1 has not duplicated the left element.

The problem is that meshio.read doesn't read this MSH4.1 correctly. It doesn't complain but

>>> mesh.cells
{'line': array([[0, 1],
       [1, 2]])}
>>> mesh.cell_data
{'line': {'gmsh:physical': array([1, 2])}}

The cells are right but the membership of line cell 1 in "gmsh:physical" 2 is lost. I suspect this is the same as one or more of the issues listed above yesterday.

gdmcbain avatar Jan 06 '20 23:01 gdmcbain

For points / cells that are "tagged" with multiple sets,

  1. The 1st point_data / cell_data method requires creating intersections of sets (as MED does)
  2. The 2nd "explicit enumerations of points / cell id's" method naturally covers the possibilities of having several sets that contain the same point / cell.

Yes.

I think the 1st representation is natural for MEDIT and MSH2, but it's definitely defective for MED and MSH4.1, maybe Exodus II, XDMF, …

There's not much benefit for MEDIT & MSH2 in using the second representation that I can think of. Possibly convenient for meshio in not having two representations, but then if read-writing MEDIT/MSH2, it would mean a pointless roundtrip conversion between first and second representations. Maybe not that big a negative, but meanwhile: would it be a reasonable step to just get all the meshio implementations of the second representation into line under the banner of ‘uniformation of point / cell sets among formats’. Then after that we can think about whether the first representation can be dropped from formats for which it's adequate.

gdmcbain avatar Jan 06 '20 23:01 gdmcbain

I wouldn't worry about MSH2.2 at all. So yeah, let's see if we can get representation 2 to work. Let's best start with those formats where it's native.

nschloe avatar Jan 06 '20 23:01 nschloe

I mostly mentioned MSH2.2 as our current implementation of MSH4.1 basically coerces the latter into the former.

Also though MSH2.2 is still the latest MSH version written by Gmsh 3.0.6 which is the one in the official Ubuntu package for Ubuntu 18.04, the latest Ubuntu LTS.

MSH2.2 was an important format for many years so there are a few meshes out there written in it; e.g. scikit-fem/docs/examples/box.msh.

I don't think we need to touch it here though, no.

gdmcbain avatar Jan 06 '20 23:01 gdmcbain

I don't think we need to touch it here though, no.

Actually in the same spirit of unification, it would be good to see all meshio implementations of the first representation unified ("gmsh:physical" and "medit:ref" spring to mind). That has always grated. But perhaps not if the first representation is to be abandoned (something I'm not sure about).

gdmcbain avatar Jan 07 '20 02:01 gdmcbain

Yes these two representations are both valid and can be inter-converted if needed. For points / cells belonging to multiple subsets, the 1st method involves introducing new cell tags corresponding to intersections of sets, while the 2nd method is more convenient.

Note that internally the MED format actually adopts the 1st method!

mesh.point_tags =
{2: ['Side'],
 3: ['Side', 'Top'],
 4: ['Top']}

mesh.cell_tags =
{-6: ['Top circle'],
 -7: ['Top', 'Top and down'],
 -8: ['Top and down'],
 -9: ['A', 'B'],
 -10: ['B'],
 -11: ['A', 'B', 'C'],
 -12: ['B', 'C'],
 -13: ['C']}

so the tag 3 (which is automatically created internally by MED) is assigned to nodes that belong both to "Side" and "Top"; the tag -11 is for cells that belong to the subsets "A", "B" and "C". From this the user can reconstruct the 2nd representation if he wants.

What I suggest is that if these additional tags for intersections of sets are already pre-computed and available in the mesh format, then the 1st representation can be used. Otherwise (for MSH4 as I understand), construct and expose these subsets using the 2nd method is more feasible, as the computation of set intersections could be tedious.

In practice,

  • When the 1st method is used, these tags should remain inside the point_data / cell_data structure, with a name to be defined: either specific for each format (in this case a list of these names could be provided), either a unique name
  • When the 2nd method is used, expose these subsets directly in the Mesh object by mesh.point_sets and mesh.cell_sets.

tianyikillua avatar Jan 08 '20 21:01 tianyikillua

either specific for each format (in this case a list of these names could be provided), either a unique name

I vote for a unique name. That makes is trivial for meshio to convert cell-tags between any formats using the first representation, e.g. MSH2.2 and MEDIT, which it currently does not attempt.

It also means that any downstream program wanting to open a mesh and find the cell-tags (e.g. signifying subdomains or parts of the boundary in a mixed boundary value problem) can do so via meshio without worrying about whether the input mesh was written in MSH2.2 or MEDIT or whatever. For a real code example, see: skfem.importers.meshio.from_meshio for subdomains and parts of boundaries.

gdmcbain avatar Jan 08 '20 22:01 gdmcbain

Yes I also vote for a unique name. Should some special treatment (underscore, add a "meshio" prefix, ...) be used for its name to reflect that it is a special point / cell data?

tianyikillua avatar Jan 09 '20 07:01 tianyikillua

Yes, I guess so. I don't foresee the particular choice being that important. I think either of your suggestions will work fine. Say "meshio:tag"?

gdmcbain avatar Jan 11 '20 07:01 gdmcbain

@nschloe @gdmcbain @keurfonluu I begin to work on an idea yesterday.

Since several formats (GMSH in particular) may define several tag possibilities (gmsh:physical and gmsh:geometrical) within a same mesh, using a unique name like meshio:tag only for one of them is not nice.

What I propose now, is to

  1. Keep the original point/cell tag inside point_data and cell_data, like gmsh:physical, medit:ref...
  2. For interoperationality among formats, when reading a mesh file (one of ) the tag information is also stored in a new attribute mesh.point_tags / mesh.cell_tags. This new attribute is used when writing to a new mesh file.

The name point_tags and cell_tags are in parallel with the point_sets and cell_sets. They reflect two different implementations of regions.

What do you think?

tianyikillua avatar Feb 13 '20 09:02 tianyikillua

Thanks for taking this up again.

I wouldn't be too worried about "gmsh:geometrical"; I don't believe that it is ever used in applications. Also it's only MSH 2.2, isn't it? In referring to Gmsh formats, the latter must be distinguished from MSH 4.1, they're incompatible and far from isomorphic.

One other thing with MSH 4.1 is that cells (or rather cellblocks, but cells through their membership of blocks) can have zero or more physical tags. This was supposed to be the case in MSH 2.2 too but Gmsh itself didn't write MSH 2.2 like this, duplicating or multiplying elements when they were specified according to the GEO file to have two or more physical tags. (This is.documented here a few hundred issues back.) I'm not sure whether there are other file formats that allow other than one physical tag per cell, probably. MEDIT has exactly one tag per cell. To handle this, should each cell in .cell_tags get a (possibly empty) list or set of tags?

Besides MSH 2.2, which formats have 'several tag possibilities'?

I particularly like the idea of starting to get the physical tags out of .cell_data. As @keurfonluu wrote earlier today or yesterday, this contains fields like temperature which are quite different in character, float rather than int.

Also, I'm not sure whether this is the right place to ask, but could you elaborate a little on .cell_sets? I've only seen them used in formats which I don't use so am not clear on their more applicability or semantics.

gdmcbain avatar Feb 13 '20 10:02 gdmcbain

One other thing with MSH 4.1 is that cells (or rather cellblocks, but cells through their membership of blocks) can have zero or more physical tags. This was supposed to be the case in MSH 2.2 too but Gmsh itself didn't write MSH 2.2 like this, duplicating or multiplying elements when they were specified according to the GEO file to have two or more physical tags. (This is.documented here a few hundred issues back.) I'm not sure whether there are other file formats that allow other than one physical tag per cell, probably. MEDIT has exactly one tag per cell. To handle this, should each cell in .cell_tags get a (possibly empty) list or set of tags?

How does GMSH treat exactly points / cells that are tagged by several sets? In your example (https://github.com/nschloe/meshio/issues/619#issuecomment-571355578)

$Elements
2 2 1 2
1 1 1 1
1 1 2 
1 2 1 1
2 2 3 
$EndElements

It appears to me that the 1st element is only tagged by 1, although in the GEO file it is also tagged by 2.

In MED, when cells / points belong to several subsets defined by the user, then internally intersections of subsets are created with a new tag (https://github.com/nschloe/meshio/issues/619#issuecomment-572268612), so meshio can easily read these new tags for each point / cell.

If in GMSH no new tags are created for represent intersections of original subsets, I suggest adopt first the point_sets and cell_sets approach, already operational in the ABAQUS reader.

In [3]: mesh
Out[3]:
<meshio mesh object>
  Number of points: 270
  Number of cells:
    hexahedron: 88
  Point sets: Nfix, Nsurface

In [4]: mesh.point_sets
Out[4]:
{'Nfix': array([  0,  45,  90, 135, 180, 225]),
 'Nsurface': array([  0,  45,  46,   1,  47,   2,  48,   3,  49,   4,  50,   5,  51,
          6,  52,   7,  53,   8,  54,   9,  55,  10,  56,  11,  57,  12,
         58,  13,  59,  14,  60,  15,  61,  16,  62,  17,  63,  18,  64,
         19,  65,  20,  66,  21,  67,  22,  68,  23,  69,  24,  70,  25,
         71,  26,  72,  27,  73,  28,  74,  29,  75,  30,  76,  31,  77,
         32,  78,  33,  79,  34,  80,  35,  81,  36...])}

In mesh.point_sets and mesh.cell_sets, we just enumerate the point / cell natural id (corresponding to mesh.points and each block inside mesh.cells) defining the subset. Using this representation, for your GMSH example, that would be (suppose only one block of two line elements is present in cells)

mesh.cell_sets = {
    "left": [np.array([0])],
    "all": [np.array([0, 1])]
}

the name of each tag can be directly put into this structure.

tianyikillua avatar Feb 13 '20 12:02 tianyikillua

It appears to me that the 1st element is only tagged by 1, although in the GEO file it is also tagged by 2.

The tags aren't given directly in the MSH 4.1 $Elements section. This one has two blocks of elements. The first block of elements belongs to 1-dimensional 'entity' 1, the second to 1-dimensional entity 2. Those element blocks belonging to entity (1, 1) have 2 physical tags, 1 and 2, corresponding to the names "left" and "all". The element blocks belonging to entity (1, 2) have only 1 physical tag, 2, "all".

gdmcbain avatar Feb 13 '20 14:02 gdmcbain

I like cell_sets and point_sets a lot since, in my view, these are the only actual tags: Each cell can have multiple, doesn't need to have any. This should be used for things like boundary, inlet etc.

As opposed to that, many formats define regions in terms an integer-valued function on cells. Each cell has exactly one value. meshio shouldn't care about that. "You give me an integer function? Okay, here's your cell_data." If a format has proper tags -> cell_sets.

For the interoperability ideas: point_tags, cell_tags don't need to copy over the array. Make it a private attribute _point_tag_name, _cell_tag_name, to a key in {point,cell}_sets.

nschloe avatar Feb 13 '20 14:02 nschloe

Thanks for the explanation of Mesh.cell_sets. It looks like it could cover MSH 4.1, albeit with some redunduncy, each element of a block always having the same tags.

gdmcbain avatar Feb 13 '20 14:02 gdmcbain

each element of a block always having the same tags

Sorry to ask, you probably have explained that somewhere before: Are those gmsh 4.1 blocks always of the same cell type?

nschloe avatar Feb 13 '20 14:02 nschloe

Yes. The four ints beginning each block of elements are

 entityDim(int) entityTag(int) elementType(int) numElementsInBlock(size_t)

so exactly one elementType.

gdmcbain avatar Feb 13 '20 14:02 gdmcbain

I'll try to first work on a version that covers all formats but msh4.1.

tianyikillua avatar Feb 13 '20 17:02 tianyikillua

A first draft is now in https://github.com/tianyikillua/meshio/commits/unification-point-cell-tags.

Already done

  • abaqus: point and cell sets
  • avsucd: cell tags
  • exodus: point sets
  • flac3d: cell tags
  • gmsh22: cell tags
  • med: point and cell tags
  • medit: point and cell tags
  • ugrid: cell tags

TODO

  • gmsh44: cell tags
  • permas: point and cell sets
  • ugrid: cell tags (WARNING after reviewing the code, it seems that when writing it does not support cells that contain multiple blocks of a same cell type due to ugrid_counts[key] = data.shape[0], also should in fact "triangle" be before "quad"?)

Repr of mesh also now prints point / cell tags.

In [4]: meshio.read("flac3d/flac3d_mesh_ex.f3grid")                             
Out[4]: 
<meshio mesh object>
  Number of points: 160
  Number of cells:
    hexahedron: 45
    pyramid: 9
    hexahedron: 18
    wedge: 9
    hexahedron: 6
    wedge: 3
    hexahedron: 6
    wedge: 3
    pyramid: 6
    tetra: 3
  Cell data: flac3d:zone
  Cell tags: 1, 2, 3, 4

In [5]: meshio.read("flac3d/flac3d_mesh_ex.f3grid").cell_tags                   
Out[5]: 
[array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2]),
 array([2, 2, 2, 2, 2, 2, 2, 2, 2]),
 array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]),
 array([4, 4, 4, 4, 4, 4, 4, 4, 4]),
 array([3, 3, 3, 3, 3, 3]),
 array([3, 3, 3]),
 array([3, 3, 3, 3, 3, 3]),
 array([3, 3, 3]),
 array([3, 3, 3, 3, 3, 3]),
 array([3, 3, 3])]

Now point / cell tags can be transferred among formats, through mesh.point_tags and mesh.cell_tags. Here the cell data gmsh:physical is transferred into med:family.

In [3]: meshio.read("msh/insulated-2.2.msh")                                    
Out[3]: 
<meshio mesh object>
  Number of points: 67
  Number of cells:
    line: 21
    triangle: 111
  Cell data: gmsh:physical, gmsh:geometrical
  Cell tags: 1, 2, 3

mesh = meshio.read("msh/insulated-2.2.msh")
mesh.write("test.med")
mesh = meshio.read("test.med")

In [14]: mesh                                                                   
Out[14]: 
<meshio mesh object>
  Number of points: 67
  Number of cells:
    line: 21
    triangle: 111
  Cell data: med:family
  Cell tags: 1, 2, 3

tianyikillua avatar Feb 13 '20 21:02 tianyikillua

@gdmcbain As you can see from the previous flac3d example, the 1st (hexa) block contains two tags. So each cell block is not necessarily associated with a unique tag.

tianyikillua avatar Feb 13 '20 21:02 tianyikillua