grass icon indicating copy to clipboard operation
grass copied to clipboard

Vlib: always write out topo files in update mode

Open metzm opened this issue 1 year ago • 15 comments

When a vector with topology is opened in update mode, but nothing is changed, topology info (topo, cidx, sidx) is not written out. This PR fixes this.

I am not sure if this is a bug or an enhancement.

metzm avatar Feb 26 '24 15:02 metzm

When a vector is opened with Vect_open_update2(), the topo, cidx, and sidx files are deleted and need to be created anew which happens with Vect_close(). That means with a crash before `Vect_close(), these files and with them all of the topology info are lost.

metzm avatar Feb 26 '24 17:02 metzm

When a vector is opened with Vect_open_update2(), the topo, cidx, and sidx files are deleted and need to be created anew which happens with Vect_close().

Thanks for the explanation. This PR then sounds like a bug fix to me. I think it is fair to say that expected behavior of "open-for-update, do-nothing, close" sequence is that it would not invalidate the data.

wenzeslaus avatar Feb 26 '24 17:02 wenzeslaus

With 8ef73e8 there is a better solution where a map opened in update mode survives with level 2, any changes would be discarded, i.e. if Vect_close() can not be called because of a crash earlier on, the old topo, sidx, cidx, fidx files would still exist.

metzm avatar Feb 26 '24 17:02 metzm

With 8ef73e8 there is a better solution where a map opened in update mode survives with level 2, any changes would be discarded, i.e. if Vect_close() can not be called because of a crash earlier on, the old topo, sidx, cidx, fidx files would still exist.

Works well! Thanks!

Once topology is built after any changes, Vect_close() or v.build is needed and there is no way to discard them.

HuidaeCho avatar Feb 26 '24 20:02 HuidaeCho

I am not convinced with the current proposed solution because in case of a crash this can lead to a modified coor file (modified geometries) and old topo, sidx, cidx files no longer matching the new coor file which can cause strange errors.

Therefore I prefer my first solution were even with no changes, the topo, sidx, cidx files are created anew. In case of a crash, there will be no topo, sidx, cidx files and v.build will be needed to recreate them from the coor file.

metzm avatar Feb 27 '24 08:02 metzm

I am not convinced with the current proposed solution because in case of a crash this can lead to a modified coor file (modified geometries) and old topo, sidx, cidx files no longer matching the new coor file which can cause strange errors.

Therefore I prefer my first solution were even with no changes, the topo, sidx, cidx files are created anew. In case of a crash, there will be no topo, sidx, cidx files and v.build will be needed to recreate them from the coor file.

Did a quick test.

  • open; fatal error (2nd solution preferred)
    • 1st solution: Level downgraded; Need v.build
    • 2nd solution: Level 2
  • open; close (tie)
    • 1st solution: Level 2
    • 2nd solution: Level 2
  • open; changes; fatal error (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; close (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; build; fatal error (1st solution preferred)
    • 1st solution: Level downgraded; Need v.build; Changes preserved
    • 2nd solution: v.info doesn't work; Level 2 after v.build; Changes preserved
  • open; changes; build; close (tie)
    • Everything is good!

In no case, changes were discarded. With open; changes; close breaking v.info with the 2nd solution, your 1st solution might be better. I think with either solution, a custom error handler that builds and closes the map would be needed. If that's missing, the 1st solution would be better (downgrade over breaking v.info).

Or breaking v.info with the 2nd solution might also be better instead of silently (hard to notice) downgrading level 1 with the 1st, causing issues later? At least, the error message mentions rebuild topology:

$ v.info roadsmajor_upper_half
WARNING: Size of 'coor' file differs from value saved in topology file
WARNING: Please rebuild topology for vector map
         <roadsmajor_upper_half@user1>
ERROR: Unable to open topology file for vector map
       <roadsmajor_upper_half@user1>

Having said that, there are only three vector modules that use Vect_open_update2?():

$ cd vector
$ find -type f -name "*.c" -exec grep -H "Vect_open_update2\?(" {} \; | cut -d/ -f2 | sort -u
v.clean
v.edit
v.patch

So I'm not sure...

HuidaeCho avatar Feb 27 '24 15:02 HuidaeCho

This may not be obvious since we have mostly tests of individual tools, but these examples can be rewritten as tests in Python without a need to create a separate binary like we have for 3D raster library. We have examples of using ctypes to test the C library functions for both grass.gunittest tests using ctypes (also without grass.pygrass) and pytest-based tests.

wenzeslaus avatar Feb 28 '24 05:02 wenzeslaus

Probably we could take a different route – modify functions that write coor file to set "dirty" bit (e.g. touch dirty or topo version -1) before write indicating topo et al. files being not up to date. Thus no "dirty" bit == no need to modify topo file. "Dirty" bit present == failure to open on level 2, rebuild required. I didn't look deep into code to see how easy this could be implemented.

marisn avatar Feb 28 '24 08:02 marisn

@marisn This does not solve the problem of a crash.

I prefer solution 1 where the support files are deleted when opening and always written out anew when closing. Writing out these files is fast and does not require rebuilding topology. With solution 2, there is the danger of ending up with non matching/wrong support files which can lead to difficult to debug problems.

metzm avatar Feb 28 '24 14:02 metzm

@marisn This does not solve the problem of a crash.

I prefer solution 1 where the support files are deleted when opening and always written out anew when closing. Writing out these files is fast and does not require rebuilding topology. With solution 2, there is the danger of ending up with non matching/wrong support files which can lead to difficult to debug problems.

@metzm Based on my test, I think solution 1 should be preferred because with either solution, we need an error handler to properly close the map and it'll be the developer's mistake if that doesn't happen and solution 1 will be less harmful in that case.

HuidaeCho avatar Feb 28 '24 15:02 HuidaeCho

There is a strange error in the tests with Running ./python/grass/pygrass/vector/testsuite/test_pygrass_vector_doctests.py... resulting in ERROR: Category index is not up to date

I guess that before the test vector is closed, topology must be built before https://github.com/OSGeo/grass/blob/main/python/grass/pygrass/vector/init.py#L621

The error in the test seems to be an error in the test: features are written to a new vector, but topology is not built with GV_BUILD_ALL before closing.

metzm avatar Feb 29 '24 21:02 metzm

I saw that same error between edits without topology building. I was wondering if it's always required to rebuild topology or if there are any cases where we don't need it when editing features. Will it be slow to rebuild it if there are a lot of features?

HuidaeCho avatar Feb 29 '24 21:02 HuidaeCho

[...] Will it be slow to rebuild it if there are a lot of features?

If topology is already built and you just want that the cidx is marked up to date, it should be enough to call Vect_build() without calling Vect_build_partial(GV_BUILD_NONE) first, and this should be very fast.

metzm avatar Mar 02 '24 13:03 metzm

In pygrass, here https://github.com/OSGeo/grass/blob/main/python/grass/pygrass/vector/abstract.py#L454-L470 and here https://github.com/OSGeo/grass/blob/main/python/grass/pygrass/vector/abstract.py#L454 the vector is first closed, then topology is built, then the vector is closed again. This seems wrong to me, topology should be built when the vector is still open, or the other way around, the vector should only be closed after topology has been built, not before.

metzm avatar Mar 02 '24 13:03 metzm

This seems wrong to me, topology should be built when the vector is still open, or the other way around, the vector should only be closed after topology has been built, not before.

Please create a PR for the documentation as neither Vect_build or Vect_close mention this important bit.

marisn avatar Mar 03 '24 07:03 marisn