bids-validator
bids-validator copied to clipboard
Interaction of BIDS and HED for errors
@effigies @rwblair @happy5214 We are close to releasing a major version change 3.15.0 --> 4.0.0 of the hed-validator. The new validator is streamlined in that much of the legacy code has been removed. The new validator validates all features in the current HED specification.
Before we move into the release phase, we would like your advice on the issue format. Here is my understanding from looking at the code.
The HED validator currently wraps its internal errors in a BidsHedIssue class with properties: code, file, evidence, and hedIssue.
When the BIDS HED functions call the HED validator, they get BidsHedIssue objects. BIDS then calls hedOldToNewLookup to translate the old error code to the new error codes and construct the issue objects in BIDS. We could return the new BIDS issue format directly in the new release or we could leave as is. The error messages seem to be coming out okay in the current version.
Here are the things that we permit for potential recording:
https://github.com/bids-standard/bids-validator/blob/49b87424732dff84bec8f06db4c50404f0649cbb/src/types/issues.ts#L8-L19
- code: Primary code like
HED_ERRORorHED_WARNING - subCode: Secondary code. Initially used to report the specific fields JSON schema errors applied to, they can be used for other things, and HED is welcome to make recommendations. I believe we're using your
hedIssuefor HED errors/warnings. - severity: WARNING/ERROR
- location: This is intended to be the file that the error is found in. For JSON sidecars with inheritance, we point to the specific file that the bad value was loaded from. For absent fields, we can only point to the data file.
- issueMessage: Additional text that might not be knowable from the code/subCode, but is available to the function that generates the issue.
- suggestion: Additional text to make a specific suggestion to the user.
- affects: For errors that arise in a file but potentially affect multiple files, the files that are affected. For example, an error in a /bold.json would affect a lot of
*_bold.nii.gzfiles. - rule: A place to record the schema rule (e.g.,
rules.files.anat.parametric) that resulted in the issue. - line: For issues arising in text files, the line the error was detected on. JSON schema can give this, for example.
- character: Ibid.
We consider the (code, subCode, location) tuple to be unique.
We don't currently populate all of these, but we hope they'll allow for increasingly precise and helpful error messages as we continue to develop the validator. HED is welcome to use any and all of them, though we would probably want to keep code to HED_ERROR and HED_WARNING.
@happy5214 and I had a detailed discussion today about this. We are keeping HED_ERROR, HED_WARNING, and HED_INTERNAL_ERROR error as the codes. The other errors are either not used or are handled as HED_ERROR or HED_WARNING.
We map subCode into the HedCode in the HED validator --- which corresponds to the error codes specified in the HED specification. The issueMessage currently is mapped from the HED issue message, which already has all of the context information for the error included in the string. We are going to keep that as is and the location, which maps to the file name.
@happy5214 will be making some PRs on the BIDS side to simplify things. We have to release HED validator 4.0.0 before we can complete the change, but we have a path forward on this.
According to the plan established today, file will remain as an IssueFile-type object (that type is apparently only used in a productive manner by the hed.ts module; I don't know if you have other plans for it), and its path mapped to location as is the current case. evidence will be renamed to message (and mapped to issueMessage) in v4.0.0, while the BIDS and HED-spec codes will be named bidsCode and hedCode (and mapped to code and subcode). hedCode would be a getter mapping to issueObj.hedIssue.hedCode. INTERNAL_ERROR is not in the HED spec (though it will be used unofficially as a hedCode value). Do you have a strong feeling one way or another about it being a separate explicit issue code?
How stable is your Issue type? @VisLab had concerns about tying our code too closely to that type if it was going to change significantly in future releases (specifically BIDS 2). But your mentions of several fields currently being unused makes me feel that this is meant to be forward-compatible.
Speaking just for me, as I have not discussed this with @nellh or @rwblair...
filewill remain as anIssueFile-type object (that type is apparently only used in a productive manner by thehed.tsmodule; I don't know if you have other plans for it),
Ross and Nell might have memories as to why that's there, but I don't see any reason we would modify it if you're using it. I suspect the reason it's there is that we considered we might want to access the file tree through .parent, but as you say, we aren't using it after all.
and its path mapped to
locationas is the current case.evidencewill be renamed tomessage(and mapped toissueMessage) in v4.0.0, while the BIDS and HED-spec codes will be namedbidsCodeandhedCode(and mapped tocodeandsubcode).hedCodewould be a getter mapping toissueObj.hedIssue.hedCode.
All sounds good to me.
INTERNAL_ERRORis not in the HED spec (though it will be used unofficially as ahedCodevalue). Do you have a strong feeling one way or another about it being a separate explicit issue code?
Feels like it might make sense to have another code, since it is a different kind of thing. I could imagine that INTERNAL_ERROR-like codes could get a different section of the report with more detailed tracebacks in the message. But we could just as easily look for (code, subcode) value pairs as code values to achieve that, so I'd be comfortable with whatever you'd decide.
How stable is your
Issuetype? @VisLab had concerns about tying our code too closely to that type if it was going to change significantly in future releases (specifically BIDS 2). But your mentions of several fields currently being unused makes me feel that this is meant to be forward-compatible.
I'd say we're aiming for forward compatibility, and I can't see us adding a new, non-optional field. I don't foresee BIDS 2.0 impacting issues, as they are intended to model facts about files, which is a pretty generic goal.
#173 does use the Issue interface, but we've run into a small hurdle. Our code uses a string for errors that occur over more than one line in a TSV file (which happens from time to time when merging rows that belong to the same event). This seems to conflict with Issue's numeric type for line. Do you have any advice on that?
#173 does use the
Issueinterface, but we've run into a small hurdle. Our code uses a string for errors that occur over more than one line in a TSV file (which happens from time to time when merging rows that belong to the same event). This seems to conflict withIssue's numeric type forline. Do you have any advice on that?
Actually, I just realized this is a problem even for single-line issues, as all parameters in hed-validator Issue objects are stringified before being stored in the object (except for string bounds, which are number arrays). We will fix that in an upcoming release.
I think I'd be okay with changing Issue.line to be number | string. It's not unused, so we can't just change it to string without complication, but if HED has need for it to be string sometimes, that seems fine.
Should character also be number | string? Or, if these are ranges, should we do number | [number, number] and HED can use arrays instead of converting to strings?
Should
characteralso benumber | string? Or, if these are ranges, should we donumber | [number, number]and HED can use arrays instead of converting to strings?
We generally use string for character.
character (which I'm assuming is the index) is not used by hed-validator in Issue, largely because the character indices are not carried through parsing. line is not guaranteed to be a range per se (unless something in the BIDS standard requires the TSV file to have sorted onsets), but I'd prefer a number array option over a string if possible. We can have BidsHedIssue re-parse the line string on our end into a list of numbers before returning.
character(which I'm assuming is the index)
Just a note: In 4.0.0, the character is now a string with the actual problematic character. The reason for the change was that from a user perspective the positions are really not that useful -- since they are often positions in an assembled string combined from information in the sidecar and tsv. A user wants to find the actual character somewhere in the sidecar or tsv file. That being said -- @happy5214 is correct that the BidsHedIssue which maps HED issues to BIDS issues can be adjusted as needed.
character(which I'm assuming is the index)Just a note: In 4.0.0, the
characteris now a string with the actual problematic character. The reason for the change was that from a user perspective the positions are really not that useful -- since they are often positions in an assembled string combined from information in the sidecar and tsv. A user wants to find the actual character somewhere in the sidecar or tsv file. That being said -- @happy5214 is correct that theBidsHedIssuewhich maps HED issues to BIDS issues can be adjusted as needed.
I meant character in BIDS. I think it's always been the actual character in HED (index was the character index parameter name for HED Issue purposes).
character is currently unused in the BIDS validator, but yes, the idea was character index, I think. I agree it's not terribly useful. For TSV, I think row+column makes more sense than line+index. For JSON, if the JSON parser emits line+index, then it's helpful. The builtin JSON parser doesn't, hence it being unused.
Given that it's unused, we could adopt HED conventions. (cc @rwblair @nellh for impact considerations)