Onda.jl
Onda.jl copied to clipboard
use `FilePathsBase.Path` in `Onda.load`?
Onda.load
could automatically apply the Path
constructor to the file_path
field of a signal to load, to support seamless loading of s3 objects whose paths have been serialized as strings. Onda would not need to depend on AWS, AWSS3, etc, just on FilePathsBase, thanks to the magic of type piracy.
julia> using FilePathsBase
julia> typeof(Path("s3://hello"))
PosixPath
julia> using AWSS3
julia> typeof(Path("s3://hello"))
S3Path{AWS.AWSConfig}
Alternatively, we could fix serialization of S3Path
s in AWSS3 (which would need to add a dependency on ArrowTypes once https://github.com/JuliaData/Arrow.jl/issues/209 is resolved) and then use those in signals to load data from s3 without any changes needed to Onda. Alternatively, if don't want to add FilePathsBase here, and AWSS3 doesn't want to add ArrowTypes, we could just pirate methods of toarrow
on S3Paths in the code that calls Onda.load
and Onda.store
(i.e. in neither package).
I'm fine with adding FilePathsBase.Path as you mentioned, as long as it's pure-value-add feature that non-FilePaths
users don't have to think about, i.e. the actual requirements here don't change
I'm assuming the Path
constructor has an identity
fallback of some sort that would facilitate this?
I'm assuming the Path constructor has an identity fallback of some sort that would facilitate this?
I don't think so... FilePathsBase has a global constant PATH_TYPES = Type[]
and packages like AWSS3 populate this with the register
function. FilePathsBase itself does register(Sys.iswindows() ? WindowsPath : PosixPath)
. So by default, there is only one path in PATH_TYPES
and it is not generic. Then when you do Path(str)
, it tryparse
s each path type in turn and chooses the first one.
So... I think the answer is then we shouldn't do this.