bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

[ENH] provide RECOMMENDATION of the "Last, First" form for the `Authors` names

Open yarikoptic opened this issue 1 month ago • 5 comments

ATM there is no consistency across e.g. OpenNeuro datasets, e.g.

ds006267/dataset_description.json:
  Authors=['Katherine M. Cole', 'Shau-Ming Wei', 'Pedro E. Martinez', 'Tuong-Vi Nguyen', 'Michael D. Gregory', 'J. Shane Kippenhan', 'Philip D. Kohn', 'Steven J. Soldin', 'Lynnette K. Nieman', 'Jack A. Yanovski', 'Peter J. Schmidt', 'Karen F. Berman']
ds006269/dataset_description.json:
  Authors=['Lucy Pritchard', 'Ingrid Buller-Peralta', 'Sally M Till', 'Peter C Kind', 'Alfredo Gonzalez-Sulser']
ds006303/dataset_description.json:
  Authors=['Linke, Julia', 'Naim, Reut', 'Haller, Simone', 'Khosravi, Parmis', 'Scheinberg, Beck', 'Byrne, Meghan', 'Harrewijn, Anita', 'Leibenluft, Ellen', 'Brotman, Melissa', 'Winkler, Anderson', 'Pine, Daniel']

and that is why some are left ambigous like

ds003834/dataset_description.json:
  Authors=['Matteo Visconti di Oleggio Castello', 'James V. Haxby', 'M. Ida Gobbini']

where for Matteo I believe there is a composite last name of "Visconti di Oleggio Castello" per e.g.

❯ curl --silent https://raw.githubusercontent.com/bids-standard/pybids/refs/heads/main/.zenodo.json | grep Matteo
	  "name": "Visconti di Oleggio Castello, Matteo",

but for the other 2 authors, the only last word is the Family name.

TODOs

  • [ ] validation: add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?
  • [ ] anywhere else in the text to add information about this?

yarikoptic avatar Nov 12 '25 18:11 yarikoptic

  • add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?

Regex?

effigies avatar Nov 12 '25 18:11 effigies

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 82.83%. Comparing base (373da35) to head (53e86b7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2255   +/-   ##
=======================================
  Coverage   82.83%   82.83%           
=======================================
  Files          20       20           
  Lines        1672     1672           
=======================================
  Hits         1385     1385           
  Misses        287      287           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 12 '25 18:11 codecov[bot]

I could support a warning for inconsistent comma usage, if any entry in Authors uses a comma either all should or none should. I can't see a way to do this in the current schema, I think we would need to add the equivalent of any(...) in python or some(...) in JS. And we could add all(...) while we are at it. We don't currently have the idea of lambdas in the expression language. Without doing that I imagine instead of a passing a function as an argument it would be a standalone expression language statement. This is then applied to the context, with something added to the scope to represent the current element of the list.

Would any one head this warning given the current noise in the output?

rwblair avatar Nov 13 '25 15:11 rwblair

  • add a check for validator to WARN about using First Last, in particular if any of the names has more than 2 components? @effigies do you see an easy way to do that?

Regex?

I must have been too tired! ;) the question is now "how". I thought now that most logical would be to add "format" which I pushed, but that might be too restrictive leading to ERRORs right away?

Otherwise, we need some custom rule which would use matches and I guess that is where @rwblair refers of us not having any way to map it across values of a metadata field?

yarikoptic avatar Nov 13 '25 19:11 yarikoptic

Ha -- so we are not testing against "known to be ok" https://github.com/bids-standard/bids-examples/ which I assume I have broken here? @effigies WDYT -- wouldn't it be worth testing against some "release" (known to be good) of the bids-examples thus preventing "regressions" (prior valid becomes invalid) in the specifications?

yarikoptic avatar Nov 13 '25 21:11 yarikoptic