bagitspec
bagitspec copied to clipboard
Proposed changes for 1.0 (updated source repo)
This is a replacement for #17 reflecting the move from the old loc-rdc organization to the primary LibraryOfCongress. The primary notable change from #17 is restoring the fetch.txt
section following discussion with @jkunze, @dbrunton, and @johnscancella.
Don't forget to update when merging:
<date day="6" month="December" year="2016"/>
It is unclear in section bagit.txt
The "bagit.txt" tag file MUST consist of exactly two lines in this order:
BagIt-Version: M.N Tag-File-Character-Encoding: UTF-8
M.N identifies the BagIt major (M) and minor (N) version numbers, and UTF-8 identifies the character set encoding used by the tag files. The bag declaration MUST be encoded in UTF-8, and MUST NOT contain a byte-order mark (BOM) [RFC3629].
This can be read as that the bag MUST always have UTF-8 as Tag-File-Character-Encoding
- which makes considerations in 2.3. Text Tag File Format moot.
But on second reading do I understand that you still want to allow any encoding (without saying where that encoding name is defined) - and that it is the bagit.txt file itself that is the only one that must be UTF-8? (why not ASCII?)
I would not mind voting for fixed UTF-8 for Bagit 1.0. This has become the norm for most formats like XML, JSON. If we allow other character encoding we must say which registry we refer to, otherwise arbitrary encoding strings like "code page 865" would be allowed.
Edit: See https://github.com/LibraryOfCongress/bagit-spec/pull/14