bids-specification
                                
                                 bids-specification copied to clipboard
                                
                                    bids-specification copied to clipboard
                            
                            
                            
                        Removing leading period from YAML extensions?
I was wondering whether there is a specific reason for having the extensions recorded with a leading period in the YAML schema data. Maybe the leading period should be inserted by path constructors based on the procedural information given by the “extension” designation, similarly to how suffixes aren't recorded with a leading underscore? The “extension” designation is not arbitrary as it stands anyway, since it already defines the position of the string in the path... so I don't think this would make it more fragile in any way.
We included the leading dot because that's how extensions are defined in the common principles and how they're used in pybids.
Here's the description in common principles:
- File extension - a portion of the the file name after the left-most period (.) preceded by any other alphanumeric. For example,
.gitignoredoes not have a file extension, but the file extension oftest.nii.gzis.nii.gz. Note that the left-most period is included in the file extension.
Yeah, we went through a period of normalizing different tooling. Reverting now would be pretty annoying to any users of libraries (notably pybids) that started out not including the initial period and went through a FutureWarning/DeprecationWarning process to migrate users and downstream projects.
@tsalo @effigies maybe this could still be put a pin in for the next major overhaul that's likely to also break backward-compatibility?
I've come across a solid reason to keep the leading period in #1036. We have some extensions (None, /) that do not have leading periods, so automatically assuming there is one would make these edge cases for us to work around.
hmmmm... thanks for the update in any case :)
I mean those have special meanings, and can be easily represented with simple logic — but, are you sure that's not actually a sign of using the concept of “extensions” incorrectly?
What are the edge cases for whether the absence of an extension needs to be noted as the presence of the extension None?
Also, / as an extension sounds like a workaround for not declaring whether a path is actually a directory.
We actually have the / issue in the DANDI extensions for BIDS (.ngff files are in fact directories), and I wanted to ask once I make more progress on the validator and the render tests whether we could encode direcotry-ness in the schema explicitly — / might do the trick, but it sounds like it's overloading the concept of an extension.
I've come across a solid reason to keep the leading period in #1036. We have some extensions (
None,/) that do not have leading periods, so automatically assuming there is one would make these edge cases for us to work around.
Well, under assumption that "there MUST NOT be an empty extension" (e.g. just filename.) and the fact that trailing / is a convention, I think it does not really mandate including . within extension, albeit indeed them require more of "special handling in code" while appending to filename.  And that is why I agree with observation of @TheChymera that
Also, / as an extension sounds like a workaround for not declaring whether a path is actually a directory.
BUT because it is a matter of the fact that some extension might be a directory or not, and now that there is https://github.com/bids-standard/bids-specification/blob/HEAD/src/schema/objects/extensions.yaml -- I think "extension" notion is not really a "file extension" but rather a "thing" of its own mapping its value to the one in the extensions.yaml where someone might make directory: True; name: mefd (or alike) for a "convention" of .mefd/ to make it "explicit".  The question is either it is "worth it" - i.e. should we offload it to tooling code to judge that extension implies directory via .endswith('/') instead of dereferencing into "extensions table" and checking for .directory attribute?
Having trailing / in extension itself also makes it a bit harder for tooling since comparing output of e.g. split_ext - the value should first get rid of trailing /. So making it "simpler" for some usecases might make it harder for others.
We included the leading dot because that's how extensions are defined in the common principles and how they're used in pybids.
well, had to rerun tutorial: https://github.com/bids-standard/pybids/pull/832  since git grep disagreed, but indeed confirming that pybids does that too.
But if you look at result records like
{'acquisition': 'fullbrain',
 'datatype': 'func',
 'extension': '.tsv.gz',
 'run': 1,
 'session': '1',
 'subject': '01',
 'suffix': 'physio',
 'task': 'rest'}
you will see that suffix doesn't come with leading _, as it could have to be consistent with extension to include the leading ..  Overall, although consistent with tooling (like os.path.split_ext), inclusion of leading period (.) IMHO should have not been done. But since it was done, now all BIDS tooling should also follow the  trend, so I am doomed to adjust BIDSFile of heudiconv (WiP: https://github.com/nipy/heudiconv/pull/544; attn @pvelasco) to do the same for the sake of consistent (across tooling) inconsistency (has leading . whenever e.g. prefix doesn't have leading _, similarly to any other entity).
Having said all that, I don't think there is anything to be done at this point regarding the initial issue/title - i.e. the "leading period".  Not yet 100% sure about trailing /.  E.g. -- what pybids would return in its extension for .ds/ or future .ngff etc?  will it include trailing / ?
Here's an update on how the None special value in fact makes the periods useful, but again, I don't think this speaks for the periods, rather against None.
If the list of permitted extensions includes None, even if automatically determining that it means "" and not in fact "None", iterative regex generation would e.g. create ^README\.(|md|rst|txt)$, which will fail to validate README.
Nevertheless even if ^README(|\.md|\.rst|\.txt)$ is considered preferable to ^README(|\.(|md|rst|txt))$ the leading periods can still be added when needed instead of hard-coded in the YAML :)
For the time being I'm going with ^README(|\.md|\.rst|\.txt)$ however :( https://github.com/TheChymera/bids-specification/commit/27bfb551e0aba53d5a0b4350b26052fe9a5fa35b
If we're seriously considering reverting this change, it would be worth reviewing the original discussion that led to the push for consistency in the current direction: https://github.com/bids-standard/bids-specification/pull/152
If we're seriously considering reverting this change, it would be worth reviewing the original discussion that led to the push for consistency in the current direction: #152
I personally think that fortunately or unfortunately we are past that point now since it might be too intrusive to change within 1. series of BIDS, and even for 2. might be a bit too much.  So I would vote to keep leading . in extensions.  The question which remains and I think, and which we must address is what to do about trailing /?!
https://github.com/bids-standard/bids-specification/blob/master/src/02-common-principles.md#common-principles
- doesn't mention possibility for /to be a part of a "File extension"
- we also call it "File extension" there (not just extension)
So, overall we have not defined a notion of "extension" but rather of "File extension".  Should we define "Directory extension"?  should we generalize into "File or directory name extension" and include / as indicator of being a directory? Will we ever allow for the same extension to be used for a file or directory?  (sorry for questions and not suggestions ATM)
If we're seriously considering reverting this change
At this point I have temporarily downgraded this to low-priority in my head, but I still think it's a good idea :(
One idea would be:
- None→- ""or- falsevalue.
- /→- directory_extensionsparameter
e.g.
README:
  required: true
  directory_extensions:
  - ""
  - .md
  - .rst
  extensions:
  - ""
  - .md
  - .txt
Given how JSON is parsed, ""/false actually evaluates as False (which is correct as it means no extension), whereas None which gets parsed to "None" evaluates as True. I see how encoding everything in the string is a quick shortcut to cover everything and worry about it later (which is what I'm doing atm as well). But I really think this will encourage us to write broken logic with if str.endswith("/") or other such detection methods which will become convoluted and are fragile.