vale icon indicating copy to clipboard operation
vale copied to clipboard

Offset error detection in frontmatter

Open homerduc opened this issue 6 months ago • 4 comments

Check for existing issues

  • [x] Completed

Environment

  • OS : Windows 11, WSL : Ubuntu 24.04.1 LTS
  • Install method : sudo snap install vale
  • Vale version v3.11.2

Describe the bug / provide steps to reproduce it

[!IMPORTANT] See EDIT at the end of the issue ! The bug does not only include frontmatters !

Hello, I stumbled accross a bug that I reproduced by doing the following :

  • In a new folder, create a test.md file, a styles folder, a .vale.ini file
    • styles contains a vocab named test-vocab with CASTLE in accept.txt and an empty reject.txt
    • .vale.ini is the following :
    StylesPath = styles
    MinAlertLevel = suggestion
    Vocab = test-vocab
    
    [*.{md}]
    BasedOnStyles = Vale
    
    • my test.md is the following :
    ---
    title: castle
    tags: ['castle', 'gitlab','sonarqube']
    weight: 8
    ---
    
    ## Intro
    
    Hello guys ! I love a CASTLE.
    

So, here, when running vale . what I should get is an error on line 2 (in the title on castle), but the error is considered as being on line 3 (in the tags). The output is : 3:9 error Use 'CASTLE' instead of 'castle'. Vale.Terms I can confirm the line 3 error is in reality happening on line 2 because when I correct the title from castle to CASTLE, or when I add IgnoredScopes = text.frontmatter.title to the .vale.ini, the error disappears. Also, when I delete the 'castle' from the tags, the error is well displayed on the title.

So, in summary, when detecting an error in a frontmatter, if the same term is repeated later in the frontmatter, the error is ""badly"" handled.

I hope I described the issue well, if you want to experiment and test it you can try to add lines with the same errors in the frontmatter to analyze the behavior, but this was my "base error".

EDIT

I succeeded to reproduce a similar bug with the same files but the following markdown part :

> Hello guys ! I love a castle.
> It's castle !

Here the error is considered as being on the second castle, and when we correct the second castle the error on the first one is displayed correctly

homerduc avatar May 21 '25 12:05 homerduc

Summary

When Vale analyzes a Markdown blockquote (> ...), it treats the entire block as a single unit. This behavior causes errors detected within a blockquote to be mislocated — the reported position in the original Markdown file does not correspond to the actual problematic token.

Technical Analysis

  • Vale uses the yuin/goldmark parser to convert Markdown to HTML before linting.
  • Blockquotes are converted into a single <blockquote><p>...</p></blockquote> element.
  • The error location is computed based on the transformed HTML, not the original Markdown.
  • The prepMarkdown function attempts to normalize certain Markdown constructs (code fences, link references, ordered lists), but it does not account for blockquotes.
  • Because of this, the original source position is lost or imprecise, especially when multiple problematic tokens exist within the same blockquote.

jxdm-dmoreau avatar May 21 '25 15:05 jxdm-dmoreau

The frontmatter issue should be fixed in the next release.

jdkato avatar May 22 '25 01:05 jdkato

Hello @jdkato ! Thanks a lot for the quick fix :) Please inform us if you find any fix for the blockquote bug, we're really intrigued to understand what the issue is

homerduc avatar May 22 '25 07:05 homerduc

I passed the code to GitHub Copilot to try to fix it, and it succeeded to fix the issue, however I don't know if this is an acceptable fix or if it will cause other problems, for now the tests seem to give the same results so it looks good but anyway, here is what it gave me, I hope it will help you (if it doesn't then sorry for the length haha) :

Problem

When linting a blockquote that spans multiple lines, Vale was only reporting the first occurrence of a repeated error, and subsequent identical errors on different lines within the same blockquote were not being flagged. This was especially problematic when the same issue appeared on several lines, but only one alert was shown.

Root Cause

The issue was in the assignLoc function in file.go, which is responsible for determining the line and column of each alert match within a block of text. The function iterated over each line in the block context, but the logic for associating a match with its correct line was too restrictive or incorrect for multi-line blocks. As a result, multiple matches could be mapped to the same line or to an incorrect line, causing the deduplication logic (which uses line and column as part of its key) to suppress valid alerts.

The Fix

The fix was to update the assignLoc function so that it more accurately determines the line number for each match within the block context. Specifically, the function now checks each line in the block context to see if it contains the match string (a.Match). When it finds a line containing the match, it calculates the position and returns the correct line number (idx + 1) and span for that occurrence.

Before

The function could return the same line number for multiple matches, especially if blk.Line was incorrect or if the logic defaulted to a fallback.

After

Now, for each line in the block context:

If the line contains the match, it calculates the position and returns the current line index as the alert's line number. This ensures that each occurrence of the match, even if repeated on different lines, is assigned a unique line number and position.

Code Snippet

func (f *File) assignLoc(ctx string, blk nlp.Block, pad int, a Alert) (int, []int) {
    loc := a.Span
    for idx, l := range strings.SplitAfter(ctx, "\n") {
        if loc[0] < 0 || loc[1] < 0 {
            continue
        }
        if strings.Contains(l, a.Match) {
            length := nlp.StrLen(l)
            pos, substring := initialPosition(l, blk.Text, a)

            loc[0] = pos + pad
            loc[1] = pos + nlp.StrLen(substring) - 1

            extent := length + pad
            if loc[1] > extent {
                loc[1] = extent
            } else if loc[1] <= 0 {
                loc[1] = 1
            }

            return idx + 1, loc
        }
    }
    return blk.Line + 1, a.Span
}

Result

With this change, Vale now correctly reports all occurrences of the same error on different lines within a multi-line blockquote. Each alert is assigned a unique line and column, so the deduplication logic no longer suppresses valid alerts.

Summary:

This fix ensures that each match in a multi-line block is mapped to its actual line, allowing Vale to report all relevant alerts, even for repeated errors in blockquotes or similar structures.

homerduc avatar May 22 '25 12:05 homerduc