YOLaT-VectorGraphicsRecognition icon indicating copy to clipboard operation
YOLaT-VectorGraphicsRecognition copied to clipboard

SVG preprocessing does not extract fill colors and replaces all stroke colors with 'black'

Open L483 opened this issue 5 months ago • 4 comments

The "Recognizing Vector Graphics without Rasterization" paper says:

For a point p, the attributes x of the corresponding node include the coordinates of the point, the RGB color value c and stroke width w

The "Hierarchically Recognizing Vector Graphics and A New Chart-Based Vector Graphics Dataset" paper says:

For a point p, the attributes x of the corresponding node include the coordinates of the point (px, py), the RGB color of fill cf and stroke cs, stroke width w and the primitive type t

The node features of the super node consist of the spatial features, chromatic features (the RGB color of the fill and strode), stroke width, and primitive type.

However, when parsing an SVG file, the following code is eventually called:

p = SVGParser(filepath)
type_dict = split_cross(p.get_all_shape())
width, height = p.get_image_size()
paths = shape2Path(type_dict) # get Bezier curve
node_dict = graph_builder.bezierPath2Graph(paths, 
    {'width':width, 
    'height':height, 
    'stroke':'black', 
    'stroke-width': 6}
)

split_cross extracts all and only shape-defining parameters of lines, circles, and arcs and splits circles crossed by lines and lines crossed by lines. It does not return any color information. shape2Path converts all shapes received from split_cross and turns them into Bézier-curves without any color information. bezierPath2Graph now builds the graph. It gets called with a hard-coded value black for stroke.

 def bezierPath2Graph(self, path, attrs):
        edges = []
        edge_attrs = []
        edges_control = []
        poss = []
        colors = []
        stroke_widths = []
        is_control = []

        width = float(attrs['width'])
        height = float(attrs['height'])
        def _buildNode(point):
            pos = [point.real / width, point.imag / height]
            
            if attrs['stroke'] in self.colors:
                color = self.colors[attrs['stroke']]
            else:
                print('unsuported stroke color!')
                raise SystemExit

            stroke_width = (float(attrs['stroke-width']) - 3) / 3.0
            #print(pos, color, stroke_width)
            return pos, color, stroke_width       

        idx = 0
        for element in path:
            pos_start, color, stroke_width = _buildNode(element.start)
            poss.append(pos_start)
            colors.append(color)
            stroke_widths.append(stroke_width)
            is_control.append(0)

            pos_c0, color, stroke_width = _buildNode(element.control1)
            idx_control1 = idx + 1
            poss.append(pos_c0)
            colors.append(color)
            stroke_widths.append(stroke_width)
            is_control.append(1)

            pos_c1, color, stroke_width = _buildNode(element.control2)
            idx_control2 = idx + 2
            poss.append(pos_c1)
            colors.append(color)
            stroke_widths.append(stroke_width)
            is_control.append(1)

            pos_end, color, stroke_width = _buildNode(element.end)
            idx_end = idx + 3
            poss.append(pos_end)
            colors.append(color)
            stroke_widths.append(stroke_width)
            is_control.append(0)

In here, for every component of every Bezier-curve _buildNode is called. Inside there, it sets the color of every node to the value of attrs['stroke'], which is always the hard-coded value black. It is also weird, that the code won't allow to pass other values than black, red, green, and blue.

So all parsed contents receive the same single hard-coded value, and allowed options are restricted to these four colors. If everything has the same color value, the model can not extract any usable information here.

I already verified this behavior by setting the color of individual elements in a SVG to 'yellow', to rgb(225, 114, 46), and to #27a081 (just used random values), and dumped the resulting node_dict into a human-readable file. All saved colors are [0.0, 0.0, 0.0].

As far as I see it, the code lacks the capability to parse hexadecimal colors or 'rgb(x, x, x)' colors entirely.

And I could not find the corresponding code to these lines from "Hierarchically Recognizing Vector Graphics and A New Chart-Based Vector Graphics Dataset":

For a point p, the attributes x of the corresponding node include the coordinates of the point (px, py), the RGB color of fill cf and stroke cs, stroke width w and the primitive type t: x = concat (px, py, cf , cs, w, t) , p ∈ P ,

There is no code that builds nodes with the fill color, and also no code that builds nodes and saves their node type. Does this repository not contain the 'YOLaT++' model code?

L483 avatar Jul 06 '25 19:07 L483

Similar things happen with 'stroke-width', where each node receives the same value.

L483 avatar Jul 08 '25 07:07 L483

Furthermore, the currently trained network will not learn based on colors or stroke width, as these attributes never reach the samples it trains on in its training loop, as far as I can tell.

train.py imports SESYDFloorplan from Datasets.graph_dict3. Among other tasks, this class is responsible for retrieving individual samples from a dataset. The __getitem__ loads the preprocessed graph structure, applies further processing, and returns that in a Data object called data.

data.x holds the node features. According to "Recognizing Vector Graphics without Rasterization", a node is defined as:

For a point p, the attributes x of the corresponding node include the coordinates of the point, the RGB color value c and stroke width w: x = concat(px, py, c, w), p ∈ P

data.x is set as follows: data = Data(x = feats, pos = pos)

Tracking feats shows it got set two times: During initialization and conversion to a tensor.

feats = np.concatenate((
    np.zeros((pos.shape[0], 3)),
    pos), 
    axis = 1)

...

feats = torch.tensor(feats, dtype=torch.float32)

Tracking pos is a little more complicated, but in total it holds the coordinates of all non-control points, as the paper says. So, feats contains three columns of zeros and pos (two columns).

I was able to verify this behavior by printing the contents of data.x within the training loop.

In svg.py and svg3.py, similar code lines can be found for feats with the difference that feats receives the color values here.

feats = np.concatenate((
    graph_dict['attr']['color'], 
    #graph_dict['attr']['stroke_width'], 
    graph_dict['pos']['spatial']), 
    axis = 1)

Still, stroke_width was not passed to feats here either but may have been at some point during development, as the respective commented-out line hints at.

L483 avatar Jul 22 '25 11:07 L483

I have created a new branch for YoLAT++. I have updated the bezier_parser.py and svg_parser.py; other code will be updated later.

shuguang-52 avatar Aug 01 '25 09:08 shuguang-52

Do I interpret your message correctly that a feature-complete version of YOLaT is now available on this branch, and "other code will be updated later" refers solely to the problems mentioned above?

L483 avatar Aug 07 '25 16:08 L483