musttag feat: implement `verbose`

This PR contains the following three changes:

Support output path for non-tag field
Support output filename and line number where the non-tag field is located
Support output multiple non-tag fields at the same time, instead of the first one

For example, we have a file foo.go:

package musttag

import "encoding/json"

type NestedB struct {
	NoTagFieldB string
}

type NestedA struct {
	NestedAField NestedB `json:"NestedAField"`
	NoTagFieldA  string
}

type Foo struct {
	FieldA NestedA `json:"FieldA"`
}

func main() {
	json.Marshal(Foo{})
}

Previously, this would only output a single line:

/path/to/foo.go:19:15: the given struct should be annotated with the `json` tag

When the structure is deeply nested, it will be very difficult to find all non-tag fields.

So we can try to output more information like:

/path/to/foo.go:19:15: the given struct should be annotated with the `json` tag: .FieldA.NestedAField.NoTagFieldB (path/to/foo.go:6:2)
/path/to/foo.go:19:15: the given struct should be annotated with the `json` tag: .FieldA.NoTagFieldA (path/to/foo.go:11:2)

Each line has two file locations. The first is the same as the original, which is the location of json.Marshal, and the second is the location where the non-tag fields are defined.

However, repeated structures are not checked or outputted twice, for example, if Foo was changed to:

type Foo struct {
	FieldA NestedA `json:"FieldA"`
	FieldB NestedB `json:"FieldB"`
}

The output will be unchanged, although .FieldB.NoTagFieldB is a non-tag field, the NestedB structure has already been checked in .FieldA.NestedAField, so it will be skipped.

Apr 06 '24 21:04 22dm

I'm not sure we need this 🤔

Early versions of musttag used to produce detailed reports similar to what you have implemented in this PR. It turned out to be quite noisy and the implementation was complicated. In my experience, most structs passed to Marshal-like functions are small to medium in size, so it's usually obvious which fields need to be annotated. The current report message is short and precise, like the reports produced by other golangci linters. I like it that way.

On the other hand, I don't mind implementing it as an option, e.g.:

musttag:
  settings:
    verbose: true

It would be useful at least in musttag's own tests to check exactly what fields were reported. @22dm would that work for you?

@ldez maybe you have some thoughts?

Apr 08 '24 08:04 tmzane

The original problem was the difficulty of knowing where the Marshall/Decode was.

I think a more detailed report can be useful in some cases but reporting several issues on the same line is a problem because golangci-lint will only keep one of them.

The verbose option is interesting, maybe it can be a mix of previous reports and the new report:

by default, it can report the Marshall/Decode.
with verbose, it can create one report by field (as before) in addition to the Marshall/Decode report.

Apr 08 '24 11:04 ldez

Hello, I've added the verbose config and now it won't report multiple errors on the same line.

> ./musttag ./...
/path/to/foo.go:21:15: the given struct should be annotated with the `json` tag

> ./musttag -verbose ./...
/path/to/foo.go:21:15: the given struct should be annotated with the `json` tag: .FieldA.NestedAField.NoTagFieldB (/path/to/foo.go:6:2)

Apr 09 '24 21:04 22dm

What happens if there are multiple missing tags?

If it reports only one field, then multiple runs will be required, it doesn't seem like a good behavior.

Apr 10 '24 15:04 ldez

What happens if there are multiple missing tags?

If it reports only one field, then multiple runs will be required, it doesn't seem like a good behavior.

Currently, each Marshal call is analyzed independently, so a struct type may be parsed multiple times to check for different tags. This could be prevented by introducing some kind of cache to parse each type only once.

I see two possible implementations of verbose:

Append the list of untagged field names to the default report:

type T struct {
    Foo struct {
        Bar string
        Baz string
    }
}

var t T
json.Marshal(t) // musttag: the fields of the given struct should be annotated with the `json` tag: Foo.Bar, Foo.Baz
yaml.Marshal(t) // musttag: the fields of the given struct should be annotated with the `yaml` tag: Foo.Bar, Foo.Baz

In addition to the default report, report untagged fields separately:

type T struct {
    Foo struct {
        Bar string // musttag: the field should be annotated with the `json`, `yaml` tag(s)
        Baz string // musttag: the field should be annotated with the `json`, `yaml` tag(s)
    }
}

var t T
json.Marshal(t) // musttag: the given struct should be annotated with the `json` tag
yaml.Marshal(t) // musttag: the given struct should be annotated with the `yaml` tag

The second one is tricky: to report a field with multiple tags just once we would have to change the implementation to analyze all Marshal calls first saving untagged fields into something like map[field_location][]missing_tags, then report them all once.

Apr 10 '24 17:04 tmzane

Hello, I have updated it to support the first implementation.

For the second implementation, it should not be difficult. Currently, we already have map[*types.Var]string as reports. We only need to merge the reports for output after all executions are completed, but it may not be good to output two copies (both marshal func and struct). Maybe we should introduce other settings to switch between these two behaviors.

Regarding "introducing some kind of cache to parse each type only once", I found that this may be more difficult than imagined, especially to obtain a unique identifier of a type.

For example, in the two funcs below, the Foo struct is different, but when .String() is called, we both get main.Foo, in the test file it is easy to observe this.

package main

func f1() {
	type Foo struct {
		...
	}
}

func f2() {
	type Foo struct {
		...
	}
}

In addition, due to the existence of ifaceWhitelist, we must cache each Func.Name separately.

Apr 15 '24 19:04 22dm