[Bug]: Exporting a file with missing required attributes does not throw error.
What happened?
The following code snippet throws a warning, but allows the user to export a file that breaks relative to the schema. According to the schema, the source_script_file_name property is required when source_script is filled out.
See here for an a small discussion in MatNWB: https://github.com/NeurodataWithoutBorders/matnwb/pull/667
from pynwb import NWBHDF5IO, NWBFile, TimeSeries
from datetime import datetime
from dateutil import tz
session_start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))
nwbfile = NWBFile(
session_description="Mouse exploring an open field", # required
identifier="test", # required
session_start_time=session_start_time, # required
source_script="create_nwbfile.py", # optional
)
io = NWBHDF5IO("temp.nwb", mode="w")
io.write(nwbfile)
io.close()
Question: Should this throw an error instead?
Steps to Reproduce
See above
Traceback
Operating System
macOS
Python Executable
Python
Python Version
3.11
Package Versions
No response
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
- [x] Have you checked the Contributing document?
- [x] Have you ensured this bug was not already reported?
I agree that according to the current schema, the example code you shared should throw an error instead of a warning to avoid the creation of invalid files, since file_name is a required attribute of source_script.
However I am also wondering if file_name should be made to be an optional attribute. The definition in the schema is a little unclear to me, it seems source_script could be the name of the file or a link to the source code (See also: https://github.com/NeurodataWithoutBorders/pynwb/issues/1451#issuecomment-1095019968). I think in either case the file_name attribute may not always be necessary.
@bendichter @rly do you know how these fields are being used and if file_name should be required?
I agree with Ben's comment in #1451. I think source_script should be a link to a commit/tag/release in a public repo or a name of the package with a version string, and source_script_file_name should be the entry point / the file that was run within that repo. Now that we have was_generated_by which allows for multiple names and versions/identifiers, does it make sense to keep source_script and source_script_file_name?
Regardless, I think it would be OK to have source_script_file_name be optional. Most conversion packages have only one entry point, and it seems unnecessary to have to specify that.