eynollah icon indicating copy to clipboard operation
eynollah copied to clipboard

PAGE-XML coordinates can have self-intersections

Open bertsky opened this issue 3 years ago • 1 comments

On this image, eynollah produces polygons that are invalid:

ERROR processor.ExtractPages - Page "PHYS_0002" ImageRegion "r91" Self-intersection[2151 3197]
ERROR processor.ExtractPages - Page "PHYS_0002" ImageRegion "r92" Self-intersection[1605 99]
The incriminated data is here

    <ImageRegion id="r91">
      <Coords points="2631,3129 2630,3130 2628,3130 2627,3131 2625,3131 2623,3133 2622,3133 2621,3134 2620,3134 2619,3135 2618,3135 2616,3137 2613,3137 2612,3138 2610,3138 2608,3140 2607,3140 2606,3141 2605,3141 2604,3142 2602,3142 2601,3143 2598,3143 2597,3144 2594,3144 2593,3145 2592,3145 2590,3147 2589,3147 2588,3148 2587,3148 2586,3149 2585,3149 2584,3150 2583,3150 2582,3151 2581,3151 2579,3153 2575,3153 2574,3154 2569,3154 2568,3155 2565,3155 2564,3156 2558,3156 2557,3157 2555,3157 2554,3158 2552,3158 2550,3160 2546,3160 2545,3161 2543,3161 2542,3162 2540,3162 2539,3163 2535,3163 2534,3164 2530,3164 2529,3165 2525,3165 2524,3166 2521,3166 2520,3167 2519,3167 2518,3168 2515,3168 2514,3169 2510,3169 2509,3170 2508,3170 2507,3171 2501,3171 2500,3172 2489,3172 2488,3171 2479,3171 2478,3172 2476,3172 2475,3173 2472,3173 2471,3174 2470,3174 2468,3176 2462,3176 2461,3177 2459,3177 2458,3178 2452,3178 2451,3179 2437,3179 2436,3180 2422,3180 2421,3181 2414,3181 2413,3182 2406,3182 2405,3183 2401,3183 2400,3184 2380,3184 2379,3185 2365,3185 2364,3186 2357,3186 2356,3187 2311,3187 2310,3188 2270,3188 2269,3189 2266,3189 2265,3190 2263,3190 2262,3191 2258,3191 2257,3192 2236,3192 2235,3191 2224,3191 2223,3192 2221,3192 2220,3193 2218,3193 2217,3194 2216,3193 2205,3193 2204,3194 2198,3194 2197,3195 2193,3195 2192,3196 2186,3196 2185,3195 2177,3195 2176,3196 2169,3196 2168,3197 2151,3197 2630,3197 2631,3196 2632,3197 2690,3197 2690,3166 2682,3166 2681,3165 2668,3165 2665,3162 2664,3162 2662,3160 2657,3160 2655,3158 2655,3157 2653,3155 2653,3153 2651,3153 2650,3152 2650,3140 2649,3139 2649,3138 2648,3138 2647,3137 2646,3137 2645,3136 2645,3129 2631,3129"/>
    </ImageRegion>
    <ImageRegion id="r92">
      <Coords points="1628,90 1628,91 1633,91 1634,92 1634,93 1632,95 1631,95 1630,96 1625,96 1624,97 1623,97 1622,98 1609,98 1608,99 1605,99 1608,99 1609,98 1627,98 1628,97 1629,97 1630,98 1648,98 1649,99 1673,99 1674,98 1723,98 1724,99 1744,99 1745,98 1756,98 1757,99 1773,99 1774,98 1784,98 1785,99 1936,99 1937,100 1953,100 1954,99 1986,99 1987,100 2003,100 2004,99 2035,99 2036,100 2054,100 2055,99 2102,99 2103,100 2114,100 2115,99 2141,99 2142,98 2169,98 2170,97 2181,97 2182,98 2256,98 2257,97 2262,97 2263,96 2269,96 2270,97 2278,97 2279,96 2317,96 2318,95 2351,95 2352,94 2422,94 2423,93 2443,93 2444,92 2463,92 2464,91 2475,91 2476,90 2485,90 2486,91 2496,91 2497,90 2504,90 1628,90"/>
    </ImageRegion>

bertsky avatar Feb 25 '21 08:02 bertsky

Could you please check if this is still the case with the example image and the current version @vahidrezanezhad ?

cneud avatar Apr 12 '22 14:04 cneud

On this image, eynollah produces polygons that are invalid:

ERROR processor.ExtractPages - Page "PHYS_0002" ImageRegion "r91" Self-intersection[2151 3197]
ERROR processor.ExtractPages - Page "PHYS_0002" ImageRegion "r92" Self-intersection[1605 99]

The incriminated data is here

I couldnt reproduce this error anymore. I tried for both main regions and -fl option and In either case the highest region number was r90 and no error occurred. I will appreciate if you check it again dear @bertsky

vahidrezanezhad avatar May 08 '23 17:05 vahidrezanezhad

Indeed, I cannot reproduce myself anymore. I believe the last version was 0.0.11, so I assume this has been fixed at some point.

bertsky avatar May 09 '23 18:05 bertsky