qwat-data-model
qwat-data-model copied to clipboard
proposal: enforce check constraints on geometry validity
Today I found some errors on the subscribers geometries (not valid) and I remembered that I also had in the past invalid geometries on valves, pipes etc.
Although this happens very rarely, I wanted to ask you @3nids and others if you ever considered enforcing geometry validity on the geometry columns and if you have any thoughts on this topic.
It would seem that some of the geometries that I had problems at were NULL, so a NOT NULL check would maybe mitigate this issue.
I think that NULL is not an invalid geometry. Invalid geometry would rather be like a polygon with a ring that crosses itself. I have no idea about forcing validity on the column. Is there a common usage on this?
@vmora thoughts?
yes, NULL is not an invalid geometry. I just realized that probably some of my coworkers probably imported some pipes, valves without geometry (through DB manager). I'll make a pull request to make sure the pipes valves etc have geometry. I never thought this could happen but looking at the data in production it's clearly happening.
well, then it makes sense to have not null geom for nodes and pipes I suppose.
Also a possible issue that may break our future functions on pipes (calculate height of vertexes with interpolation) etc.: Sometimes, during the pipe editing it happens that two vertexes are created at the same location. I can't envision right now what it would break but I think that a check constraint to avoid this type of problem would be only beneficial.
We could also use ST_RemoveRepeatedPoints on the pipe geometry
yes that could be added to the trigger
Do you think that we should ensure geometries don't have duplicates?
I found some in my data and I'm keen on not having them ever again.
I tried in https://github.com/qwat/qwat-data-model/pull/115 but the UNIQUE constraint is using BBOX to check the geometries, which means that if two different geometries travelling somewhat the same path and having start end end points the same can have the same BBOX and they will colide with the UNIQUE constraint: e.g.
INSERT INTO qwat_od.distributor (name) VALUES ('Test Distributor');
INSERT INTO qwat_od.pipe (id, fk_function, fk_installmethod, fk_material, fk_distributor, fk_precision, fk_bedding, fk_protection, fk_status, fk_watertype, geometry)
VALUES (1, 4105, 4201, 5121, 1, 1103, 9701, 4301, 1301, 1502, ST_GeomFromEWKT('SRID=21781;LINESTRING(500000 500000 0,500010 500000 0,500010 500010 0)'));
INSERT INTO qwat_od.pipe (id, fk_function, fk_installmethod, fk_material, fk_distributor, fk_precision, fk_bedding, fk_protection, fk_status, fk_watertype, geometry)
VALUES (2, 4105, 4201, 5121, 1, 1103, 9701, 4301, 1301, 1502, ST_GeomFromEWKT('SRID=21781;LINESTRING(500000 500000 0,500005 500000 0,500010 500010 300)'));
ERROR: duplicate key value violates unique constraint "pipe_geometry_key"
DETAIL: Key (geometry)=(01020000A015550000030000000000000080841E410000000080841E4100000000000000000000000094841E410000000080841E41000000000000000000000000A8841E4100000000A8841E410000000000000000) already exists.
Also, the BBOX checking ignores the altitude of the constituent points. I can think of different ways of doing it, but I'm not sure for now which one has better performance. Any ideas that could save time?
Starting with Postgis 2.4 adding a unique constraint on geometry checks if the geometry is the same as the =
operator logic changed:
- now: https://postgis.net/docs/manual-2.4/ST_Geometry_EQ.html
- before: https://postgis.net/docs/manual-2.3/ST_Geometry_EQ.html
@qwat-psc IMHO it would be nice to add a UNIQUE constraint on the pipe geometry etc.
While revisiting this I noticed again that the editors are doing copy and moving but sometimes they forget to move the pipe or they do it twice so we end up with duplicate geometries. So, unique geometry should be enforced.
For very long pipes, while trying to add the geom unique constraint I stumbled upon:
ERROR: index row size 7400 exceeds maximum 2712 for index "geom_unique" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing.
Of course, we do have some really long lines/pipes (more than 4 km..), if I split them etc. the constraint works. It would be nice not having to do this so any advice is welcome.
Hi @tudorbarascu ! interesting. Can you post here the constraint DDL definition? It seems you use a btree index named "geom_unique", that is not in QWAT core. It seems btree indexes can't handle data with more that 2713 bytes. So I suppose you do a btree index on the real geometry, which is especially expensive I think. I think a constraint checking first the spatial index identity based on the bbox comparison, then computing a real geometry equality test only if bboxes match would be faster and would avoid building a huge index.
Or another option could be to build an index of a computed hash. It seems this is what is done here : https://smathermather.com/2009/10/22/unique-constraint-on-geometry-continued/
Hi @haubourg For now it seems CREATE UNIQUE INDEX geom_unique ON qwat_od.pipe (md5(geometry::text));
works. The last two hours I was testing if there are any performance implications and they are none.
E.g. doing a INSERT.. by copying all the pipes and translating them with 1 meter over x, y so that they won't duplicate: INSERT INTO qwat_od.pipe (fk_function, fk_installmethod, fk_material, fk_distributor, fk_precision, fk_bedding, fk_status, fk_watertype, geometry) SELECT fk_function, fk_installmethod, fk_material, fk_distributor, fk_precision, fk_bedding, fk_status, fk_watertype, ST_Translate(geometry, 1,1) FROM qwat_od.pipe;
Has the same result with the unique index and without the index. I tested multiple times to be sure. E.g.
- 915237.952 ms (15:15.238) no index
- 917398.427 ms (15:17.398) with index
You can replicate the issue by yourself after you initialize the default database model (SRID 21781) by doing:
INSERT INTO qwat_od.distributor (name) SELECT 'Demo Distributor';
INSERT INTO qwat_od.pipe (fk_watertype, fk_status, fk_bedding, fk_precision, fk_distributor, fk_installmethod, fk_material, fk_function, geometry) SELECT 101, 101,101, 101, 1, 101,101, 101, ST_GeomFromText('LINESTRING Z (450716.893 402858.613 0,450736.653703328 402865.5671656 0,450755.09179489 402926.748105786 0,450778.55845688 402978.989365214 0,450811.177 403048.645 0,450857.060028534 403154.430600085 0,450896.729861896 403243.827407663 0,450929.136204643 403299.700412399 0,450992.831430042 403431.001973528 0,451140.894892592 403767.916192086 0,451199.002817518 403879.662201558 0,451281.694864527 404062.925657092 0,451297.896036392 404080.25378906 0,451311.2218256 404115.98535882 0,451317.153884642 404143.003271109 0,451316.063608865 404157.054051718 0,451311.648659503 404202.341226151 0,451312.983747179 404241.719272248 0,451312.266428931 404305.323903095 0,451309.22135715 404336.871691217 0,451303.036864302 404373.799627589 0,451293.17901277 404417.636176249 0,451279.965843243 404471.324685982 0,451274.310549352 404499.724195935 0,451216.882179033 404709.935051935 0,451148.717113255 404958.011192963 0,451091.786917496 405180.537379212 0,451088.653261181 405259.898602695 0,451082.16806402 405293.596532316 0,451070.993463073 405345.331940343 0,451065.964797175 405464.90074331 0,451068.259986435 405526.798381176 0,451083.570004394 405720.118977562 0,451122.246 405962.93 0,451128.578 406044.749 0,451130.319 406068.059 0,451138.6 406185.426 0,451138.708 406201.115 0,451138.545 406221.338 0,451138.968144558 406233.151250938 0,451135.924 406255.989 0,451122.456801029 406299.522036675 0,451099.548869088 406371.598212784 0,451070.494906625 406444.233118941 0,451029.413 406543.365 0,450992.831430042 406611.293403101 0,450952.044136585 406677.22354869 0,450911.256843127 406741.477504136 0,450876.512 406793.725 0,450877.197 406803.799 0,450807.891784366 406890.658426781 0,450782.748932235 406935.915560618 0,450757.606080103 407007.43300668 0,450740.285448635 407083.420293121 0,450733.718 407179.096 0,450710.114026078 407249.363117187 0,450701.733075367 407270.594858986 0,450701.733075367 407294.061520975 0,450693.352124657 407322.556753391 0,450678.825143426 407364.461506943 0,450645.860070631 407415.305941253 0,450622.95213869 407448.829744094 0,450606.748967316 407475.09005632 0,450544.729932059 407556.105913187 0,450516.234699644 407599.686856881 0,450489.41565737 407637.680500102 0,450464.272805239 407673.439223133 0,450444.648 407705.816 0,450432.983922587 407732.664608153 0,450407.282340408 407775.6868218 0,450393.314089224 407809.769354689 0,450384.933138514 407831.001096489 0,450383.256948372 407846.645537815 0,450380.463298135 407872.905850041 0,450382.698218325 407904.75346274 0,450387.435509431 407924.650085387 0,450388.285518798 407928.220124729 0,450397.225199556 407971.242338376 0,450401.136309888 407988.562969844 0,450417.339481261 408017.05820226 0,450432.42519254 408042.759784438 0,450460.920424955 408076.842317327 0,450474.888676139 408088.575648322 0,450493.885497749 408105.89627979 0,450493.326767702 408122.658181211 0,450458.656 408165.044 0,450414.545831024 408209.820068599 0,450381.860123254 408256.19466253 0,450365.098221833 408300.054971248 0,450337.096 408366.945 0,450337.382 408370.618 0,450334.37 408501.295 0,450335.244 408509.49 0,450377.110917851 408895.102471686 0,450411.933 408952.649 0,450537.827 408942.402 0,450709.08 408903.419 0,451002.888570895 408946.505636043 0,451138.101242356 408980.029438885 0,451238.672650881 409048.194504663 0,451368.298021868 409197.934157355 0,451449.872608783 409290.683345217 0,451515.802754371 409396.842054216 0,451549.326557213 409456.067439236 0,451568.209 409571.724 0,451597.377341286 409703.026120169 0,451615.815432849 409794.657847936 0,451630.901144127 409839.914981772 0,451659.396376543 409953.337181387 0,451688.450339006 410036.029228396 0,451737.75 410139.456 0,451766.672545636 410227.67363464 0,451773.377306204 410303.660921081 0,451781.199526867 410404.791059654 0,451785.669367246 410486.365646568 0,451785.846 410506.874 0,451785.928 410513.627 0,451785.46 410521.576 0,451777.009051512 410583.025944762 0,451775.33286137 410599.229116135 0,451771.142386015 410620.740222958 0,451765.275720518 410642.251329782 0,451757.732864878 410668.232276984 0,451742.367788576 410695.610049305 0,451729.516997486 410719.635441341 0,451716.945571421 410737.794167881 0,451693.758274455 410765.730670249 0,451674.202722798 410823.838595174 0,451669.174152371 410864.067158584 0,451658.278916448 410919.381433273 0,451649.618600714 410964.079837062 0,451639.282094838 411021.070301893 0,451631.459874174 411049.006804261 0,451608.272577209 411114.657584826 0,451585.923375315 411168.575034396 0,451541.783701573 411287.863899508 0,451528.526 411327.439 0,451481.999586505 411463.863864426 0,451436.183722622 411602.987646219 0,451394.27896907 411710.543180336 0,451356.844055896 411804.689193317 0,451307.23 411873.145 0,451305.401 411877.069 0,451282.291 411925.661 0,451249.707569315 411998.708202263 0,451214.786941355 412064.917712875 0,451174.558377945 412147.330394861 0,451100.806011693 412279.470051062 0,451021.745709991 412420.549388021 0,450974.506999999 412497.555999998 0,450931.719000013 412571.327000021 0,450904.068197114 412599.10187494 0,450575.416000018 412929.083000025 0,450089.932898469 413434.644874648 0,449948.383999991 413582.047999983 0,449889.203 413641.112000004 0,449871.933000013 413650.50500002 0,449826.285000021 413652.891000026 0,449819.735999995 413655.573999985 0,449746.035 413735.103999996 0,449653.499686486 413829.813690852 0,449595.4175332 413889.237576211 0,449556.850000002 413928.445000003 0,449507.944000021 413979.572000032 0,449421.592129349 414067.174364573 0,449416.714000013 414071.768000023 0,449401.955850306 414087.992322045 0,449393.537000016 414101.143000026 0,449363.653369595 414151.518788395 0,449325.966000005 414221.373000006 0,449459.957000002 414369.209000002 0,449508.420999995 414432.68399999 0,449552.250000022 414501.750000036 0,449592.510999986 414568.062999975 0,449628.232999981 414635.029999965 0,449662.010000003 414702.954000003 0,449676.526999999 414755.370999997 0,449730.071000005 415034.893000007 0,449720.380999994 415051.181999988 0,449544.690000039 415363.448000063 0,449539.563000013 415375.90000002 0,449374.405999986 415624.635999974 0,449348.672000027 415673.674000043 0,449312.383999982 415675.723999966 0,449113.222999986 416050.516999974 0,449081.422000018 416144.76200003 0,449085.372000017 416159.189000025 0,449086.759000015 416172.17800002 0,449083.761000002 416179.576999998 0,449073.293000019 416189.923000029 0,449052.684999997 416205.012999994 0,448999.068000006 416235.339000007 0,448985.775999987 416249.736999978 0,448973.823000018 416269.301000029 0,448957.038810524 416302.556088763 0,448913.657241295 416382.22478884 0,448899.452999994 416407.454999987 0,448858.59162243 416454.125814771 0,448823.037999991 416490.497999984 0,448801.631866799 416509.75492215 0,448781.58742635 416525.818411012 0,448757.981081849 416545.164438902 0,448729.206484409 416568.212053356 0,448668.723956782 416617.519980035 0,448617.181109913 416657.608860934 0,448568.431913281 416695.602504155 0,448501.803355132 416746.865986 0,448430.704956606 416805.951688509 0,448402.908136749 416830.256445569 0,448384.609727698 416849.811997227 0,448372.177984144 416868.669136325 0,448355.276400211 416892.834210874 0,448336.419261113 416936.415154568 0,448324.82561263 416968.542132291 0,448316.165296896 417000.808792527 0,448295.911332679 417073.723063708 0,448279.987526329 417131.132576074 0,448267.695465287 417177.646852517 0,448254.844674198 417223.742081425 0,448246.743088511 417247.767473461 0,448230.400234626 417298.891272795 0,448219.504998702 417324.173807438 0,448203.022462305 417365.380148431 0,448184.021685368 417409.608366973 0,448150.484974561 417485.553003319 0,448115.108907036 417563.039323631 0,448069.042858739 417664.197385935 0,448044.762453071 417715.359413529 0,448020.51608858 417761.610975131 0,447981.766414353 417824.49528982 0,447937.561981813 417886.291516139 0,447912.879 417920.117 0,447839.798370911 417990.494203741 0,447833.244083094 417996.76857568 0,447776.06707593 418040.887810684 0,447736.043070225 418071.595090668 0,447678.467015897 418102.94585636 0,447658.788846325 418112.678644638 0,447560.647432919 418152.270417325 0,447542.249443422 418157.257362095 0,447518.577934821 418164.084755854 0,447496.768114321 418168.025764528 0,447404.033685575 418189.275452185 0,447295.436391928 418214.672141451 0,447223.453089591 418233.204013118 0,447187.443556042 418244.806411792 0,447156.181950879 418262.622442483 0,447135.825750017 418278.499167776 0,447116.543550384 418295.949958738 0,447095.102043778 418317.740632958 0,447079.306 418348.369 0,447077.168221201 418349.297193206 0,447037.432 418349.527 0,446995.702000009 418347.092000012 0,446995.2647 418347.3372 0,446987.578999979 418351.580999965 0,446979.031735087 418472.690906974 0,446974.538 418521.62 0,446967.28 418610.548 0,446968.53700001 418621.255000014 0,446931.825000016 418856.188000023 0,446920.407000006 418902.112000007 0,446859.196000013 419021.59100002 0,446809.128000018 419100.752000024 0,446639.160000006 419339.622000007 0,446563.693000029 419428.086000044 0,446435.05499998 419541.40499997 0,446421.680137125 419555.792513109 0,446111.229000008 420054.275000012 0,446101.559 420054.19 0,446072.44 420065.261 0,446065.602 420069.093 0,446058.193000002 420078.361000002 0,445992.932999993 420224.426999986 0,445961.644 420279.432 0,445951.236 420302.709 0,445949.314 420322.934 0,445935.724 420335.984 0,445918.731 420339.657 0,445858.996000021 420381.004000028 0,445849.051000001 420377.536999985 0,445836.348000017 420382.489000019 0,445832.690999989 420387.045999966 0,445819.411000099 420418.375000126 0,445816.652000005 420427.197000005 0,445807.028000006 420476.067000004 0,445790.455999999 420519.840999987 0,445775.874771622 420562.146000001 0,445770.047000045 420585.194000056 0,445763.518 420597.273 0,445757.014254993 420611.954205959 0,445749.824093452 420653.87141483 0,445751.194000052 420677.314000062 0,445746.147000015 420708.327000001 0,445726.596000006 420761.842999994 0,445714.374000023 420780.066000024 0,445705.070999996 420790.657999986 0,445687.428 420832.799 0,445681.903000014 420843.619000019 0,445663.06700004 420896.026000051 0,445658.00700003 420923.540000022 0,445652.434999986 420951.617999964 0,445648.435000008 420969.494000011 0,445643.590543154 420980.778030052 0,445634.486 420994.592 0,445579.318000021 421078.479000025 0,445514.572999995 421185.432999985 0,445489.687000018 421215.368000024 0,445475.693937025 421228.864890517 0,445366.520410266 421318.407304589 0,445256.103819602 421424.801828417 0,445206.469548185 421467.254978436 0,445191.998005188 421476.798700062 0,445162.225527131 421498.04488692 0,445136.010262107 421520.215787622 0,445126.923016317 421536.299950589 0,445096.76658431 421571.065835928 0,445078.547212898 421587.516961217 0,445029.131917807 421630.853851461 0,444995.920760834 421660.575949854 0,444955.580685393 421700.024340883 0,444904.752 421757.181 0,444873.090008867 421797.689051001 0,444810.433552792 421840.179785301 0,444741.039035831 421888.727428687 0,444699.514785239 421912.901446548 0,444678.635984397 421927.902896932 0,444645.628767771 421957.626942008 0,444618.4499396 422039.94962393 0,444592.520073856 422113.093895333 0,444612.456877336 422180.658595848 0,444628.745149112 422253.093036671 0,444626.735545864 422315.560165379 0,444598.418397959 422390.712259008 0,444564.498223944 422415.81302143 0,444542.24416951 422440.536402818 0,444516.788452268 422456.857793858 0,444490.313971343 422469.167832576 0,444457.007 422495.356 0,444331.203 422853.257 0,444474.935824605 423166.008779649 0,444462.39826097 423307.10244314 0,444398.84 423489.792 0,444341.53865169 423583.827380805 0,444183.261 423646.361 0,443951.271 424425.617 0,443999.200312772 424471.344820865 0,444096.231 424585.518 0,444108.138382774 424799.951 0,444089.283899297 424938.232934677 0,444056.808944716 425053.66540845 0,443991.723116941 425156.990916615 0,443869.288 425314.543 0,443817.740363278 425281.061198306 0,443809.561 425290.744 0,443786.526088815 425332.889918977 0,443756.394679557 425395.269611075 0,443734.359104571 425464.510402291 0,443719.552949049 425534.123847871 0,443713.403312116 425578.053961077 0,443699.62260989 425635.232873504 0,443689.151700001 425663.1127 0,443667.019310081 425704.069010143 0,443662.888 425732.348 0,443642.343307869 425912.887000958 0,443631.726973653 425987.387095053 0,443602.391189761 426056.339045232 0,443472.858577352 426239.71301551 0,443361.501 426389.688 0,443287.971 426505.402 0,443024.948 426595.176 0,443017.840115769 426597.776370072 0,443005.965610162 426637.597431818 0,442966.550449297 426635.82180457 0,442793.71314914 426619.760124232 0,442711.277926894 426610.533461208 0,442643.224 426584.479 0,442372.829959053 426586.483046475 0,442284.908453482 426594.922015735 0,442008.02796248 426578.782160785 0,441789.226606732 426575.258494642 0,441582.285647648 426672.961197348 0,441518.608285305 426732.502240106 0,441451.350840415 426838.813995163 0,441398.544304046 427008.288429584 0,441376.365 427072.074 0,441342.323 427090.386 0,441307.617510005 427152.525729286 0,441260.518527205 427243.294401913 0,441204.832443026 427342.355882321 0,441173.934 427399.649 0,441156.402485282 427424.438316172 0,441138.662594588 427442.222538834 0,441123.834813 427454.795683177 0,441094.667372991 427470.622470161 0,440985.854 427524.806 0,440934.899250541 427542.578457554 0,440891.275406732 427559.455850548 0,440756.956 427630.594 0,440740.648 427631.578 0,440719.57 427629.766 0,440705.596 427627.898 0,440679.341 427625.199 0,440634.361 427620.919 0,440595.546 427616.93 0,440566.695 427614.398 0,440475.493 427606.055 0,440446.204 427604.082 0,440387.921 427614.619 0,440325.545 427625.727 0,440275.001 427633.642 0,440234.364 427637.518 0,440215.247 427638.893 0,439810.101 427664.962 0,439781.055 427646.537 0,439753.453 427629.044 0,439718.29 427608.171 0,439690.854 427591.828 0,439672.06 427580.471 0,439645.875 427564.702 0,439621.363 427550.513 0,439594.115 427545.142 0,439568.257 427543.266 0,439544.269 427541.999 0,439525.519 427545.326 0,439375.391958264 427555.939069774 0,439242.47653193 427555.444017825 0,439163.717743247 427571.268167445 0,439105.316329157 427579.0680283 0,439075.956429573 427569.40610979 0,439019.357915358 427541.291886079 0,438973.276321841 427525.968140873 0,438867.405570753 427465.983083824 0,438816.439347829 427453.851866051 0,438738.38463452 427471.026126891 0,438674.628121944 427489.538955424 0,438491.905130866 427561.381572201 0,438439.072226688 427572.97865293 0,438361.734424439 427622.162819625 0,438293.065902331 427648.109246923 0,438204.211566212 427680.981323588 0,437976.305939891 427647.597198736 0,437843.772994206 427742.326295916 0,437643.847311301 427903.748903019 0,437483.520598151 428088.857146892 0,437325.107281757 428239.118559838 0,437251.970682778 428372.01543876 0,437194.641469649 428488.893911926 0,437162.937543144 428548.005992533 0,437046.664939814 428594.341878818 0,436962.855279807 428605.11767417 0,436871.247730808 428521.621017217 0,436819.687047611 428432.558754772 0,436770.903768974 428262.06280714 0,436709.250723958 428115.361669817 0,436693.586130291 428077.383220628 0,436613.254999999 428015.800000001 0,436595.291000001 428005.901999999 0,436533.339637569 427977.493833773 0,436493.32 427972.291000001 0)', 21781);
ALTER TABLE qwat_od.pipe ADD CONSTRAINT geom_unique UNIQUE (geometry);
Also, there are at least one duplicate in the demo data model as I restored it just now and...:
qwat=# ALTER TABLE qwat_od.pipe ADD CONSTRAINT geom_unique UNIQUE (geometry);
ERROR: could not create unique index "geom_unique"
DETAIL: Key (geometry)=(01020000A01555000002000000CFD322CC48731F410448D3BC81550341000000000000000028409D0547731F41247EB27A7E5503410000000000000000) is duplicated.
qwat=# SELECT id, ST_asText(geometry) FROM qwat_od.pipe WHERE geometry = '01020000A01555000002000000CFD322CC48731F410448D3BC81550341000000000000000028409D0547731F41247EB27A7E5503410000000000000000';
id | st_astext
-------+---------------------------------------------------------------------------------------
32074 | LINESTRING Z (515282.199351606 158384.217199862 0,515281.755482676 158383.80991076 0)
32077 | LINESTRING Z (515282.199351606 158384.217199862 0,515281.755482676 158383.80991076 0)
(2 rows)
The more reason why we should enforce that geometries are unique.
For this kind of quality issue, we have the choice between a hard constraint or a soft one. Choosing which one is the most suited to a customer really depends on its data quality mean level. Those who already achieved a nice history cleanup will want a hard constraint. Those who are still struggling in digitization of old paper / CAD dataset sometimes prefer a view showing all the data errors. @qwat/qwat-psc what would be your prefered option ?
If we also had unique constraint on nodes geometry, we would have found sooner https://github.com/qwat/QWAT/issues/169 . I have 396 nodes that have duplicates. I recall the model didn't support having two nodes with the exact same geometry.
@tudorbarascu agreed, duplicate nodes shouldn't be allowed. It is also related to https://github.com/qwat/QWAT/issues/169 that we postponed because I think we need handle node merging in a clear API more than in nested specialized triggers.