All file original_name values report encoding as ASCII-8BIT
Descriptive summary
This relates to Hyrax version from 2.9.5 to 3.4.1 (and main branch).
From my investigations in heliotrope, it seemed to happen when Ruby was updated from 2.5.x to 2.7.x, so that might be more relevant to the problem. Hyrax did not update in that commit, but some underlying gems did (faraday?).
I'm still very much scratching my head about where in the stack this happens, best guess is maybe Faraday and Ruby version somehow. However it occurs, it does affect how Hyrax code interacts with this original_name value, especially when it's used to set a metadata field in Fedora, such as conditionally here (I'll write another ticket for that), as this eventually hits an "encode" in rdf here, and that will bow out with a Encoding::UndefinedConversionError should the string have any unexpected characters relative to its encoding of ASCII-8BIT.
See this pithy comment explaining this in a former similar issue in rdf.
Would it be acceptable to force the encoding to UTF-8 here in AF, if no better solution can be found? This seems to be the code that provides the value. Or maybe that method should be wrapped/overridden in Hyrax so that the encoding can be forced in Hyrax itself.
FileSet.first.files.first.method(:original_name).source_location
=> ["/Users/conorom/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/active-fedora-13.2.7/lib/active_fedora/file/attributes.rb", 12]
Rationale
It didn't used to be this way, which I just verified by downgrading heliotrope to use Ruby 2.5.8
Inspiration for this testing was drawn from another encoding issue, #1089
Expected behavior (when I downgrade to 2.9.5 on Ruby 2.5.8)
f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>
f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:UTF-8>
Actual behavior (main branch on Ruby 2.7.4)
f = ActiveFedora::File.new
f.content = 'asdf' # needed to save the file
f.original_name = 'blah.txt'
f.original_name.encoding # => #<Encoding:UTF-8>
f.save # => true
f.original_name # => "blah.txt"
f.original_name.encoding # => #<Encoding:ASCII-8BIT>
Steps to reproduce the behavior
Steps outlined above. Or just call FileSet.first.original_file.original_name.encoding on your newer/older Hyrax installs.
It's always Encoding:ASCII-8BIT now
Related work
#5671