elmerfem icon indicating copy to clipboard operation
elmerfem copied to clipboard

ElmerGrid prints message about unused node; simulation fails to produce meaningful results

Open evantandersen opened this issue 5 months ago • 7 comments

I have a scripted pipeline that creates geometry using gmsh, then feeds it into an magnetostatics simulation in Elmer. It works well, except some meshes fail to simulate properly. I'm unsure if it's a bug with ElmerGrid or Gmsh, but the gmsh author has now looked at the meshes and said they look OK.

The only thing I've been able to narrow it down is that when running ElmerGrid with -autoclean, it will print a message about removing a single unused node. If no such message is printed (ie, all nodes are "used"?), the simulation will produce meaningful results. If it does print something about an unused node, the simulation will complete, but the results will make no sense.

It's pretty random when the meshes fail to work, and you can typically modify the geometry by very small amounts and get it to work. It's not an issue with geometry overlapping, because the dimension being changed is perpendicular to any intersections.

Here is a python gmsh script that can easily produce both a working and non-working mesh. By default, it will produce a mesh that fails to run. If line 36 is changed from 12.5 to 12.49, ElmerGrid will no longer complain about an unused node, and the simulation will produce meaningful results.

Not sure what's going on here. My current workaround is to detect the unused node message in ElmerGrid and manually tweak some dimensions by random small amounts until the problem goes away. It's kind of annoying when I kick off 20 simulations to do a parameter sweep and 3 of them end up failing due to this bug.

Here is the sif file I'm using for the simulation.

evantandersen avatar Jun 12 '25 06:06 evantandersen

Could you provide a working and not-working gmsh files as well. I know that I could run these myself but it would not be the first time when there is a difference in versions etc.

raback avatar Jun 12 '25 13:06 raback

The meshes are a bit big, but when I turned the resolution down from what's in the script, I couldn't reproduce the problem as easily.

bug_demo.zip

evantandersen avatar Jun 12 '25 22:06 evantandersen

$ gmsh --version
4.12.1
$ ElmerGrid

Starting program Elmergrid, compiled on Apr 27 2025
...

I compiled Elmer from the origin/devel branch on github.

EDIT: I just pulled+built the latest (0735539a8a), still seems to have the problem.

evantandersen avatar Jun 12 '25 22:06 evantandersen

Ok, I think this is a bug with ElmerGrid. I just had the idea to convert the mesh from gmsh to Elmer, then back to gmsh, so I could open it in the gmsh gui. good.msh looks fine:

Image

bad.msh looks horrible, something has happened to the air volume mesh:

Image

evantandersen avatar Jun 13 '25 07:06 evantandersen

I added output so I can see the index of the problematic node. It is 56686 in the bad.msh.

If you look at line ~41327 in bad.msh it looks like: ... -2.173134299265071 1.76905939338269 1.29108447599133 -6.039211372968333 9.338465187511156 2.556442069192868 3 5 0 1889 56686 20537 ... If you try to find index 56686 in the element topology, you cannot find it whereas 20537 appears ~23 times.

The same node set in good.msh appears in ~41357 and looks like: ... -6.179517531687858 7.714838056993432 1.214582403730284 -5.79114284181015 12.0461114090319 1.132227254993022 3 5 0 1846 20552 20553 ... Now, I don't know where the orphan node comes from but it is not participating in the activity. I don't know whether the problem in connectivity is already in the mesh or whether the renumbering spoils it. Actually there are exactly 56686 nodes so this one is the very last one.

If you enforce Gmsh format 2 which is not as convoluted, does the same thing happen there?

raback avatar Jun 13 '25 09:06 raback

You can skip the "-autoclean" and manually subtract one node (56686->56685) from mesh.header and remove the last row in mesh.nodes and this gives you exactly the same output. So the autoclean is not the culprint.

raback avatar Jun 13 '25 09:06 raback

Since it was explained to me in the discord, I haven't believed the problem is related to -autoclean, rather I've just been using it as a way to detect this issue.

Setting gmsh to export in version 2 does seem to prevent this problem. I've added it to my script setup:

gmsh.option.setNumber("Mesh.MshFileVersion", 2)

As I run more and more simulations over the next week I'll report back if I see this problem again.

Perhaps this a gmsh bug after all?

evantandersen avatar Jun 13 '25 21:06 evantandersen