feat: add path/name attribute to daft.File
Is your feature request related to a problem?
Most of the time when I am working with daft.File in a UDF I'm not passing in the original path with it. Its not entirely clear how to get the filename or original filepath directly from the object.
This issue is concerned with 1) Either making this an explicit property or 2) Adding this usage to the example in the docstring to make it obvious.(Because I'm not sure how to do it...)
Describe the solution you'd like
import daft
from daft.functions import file
df = daft.from_glob_path("/Users/me/Downloads/*.pdf")
df = df.with_column("file", file(daft.col("path")))
@daft.func
def get_name(x: daft.File) -> str:
return x.name
@daft.func
def get_path(x: daft.File) -> str:
return x.path
df = df.with_column("name", get_name(daft.col("file")))
df.show(format="fancy", max_width = 100)
Describe alternatives you've considered
I've also considered the str(daft.File) should just resolve to the resolved path.
Additional Context
No response
Would you like to implement a fix?
No
So this is actually a limitation with our file approach and is something I've been considering changing for other reasons as well.
Currently daft.File can be backed by either a (url + io config) or an in memory bytes object. Which I've actually come to regret this design decision.
daft.File(b"hello")
daft.File("path/to/file.text")
When it's backed by bytes, there is no name or path.
For reasons beyond just this, I've been considering limiting the daft.File to only be backed by a url+io_config, and not a bytes object. It:
- makes these kind of methods possible
- greatly simplifies the implementation
- allows for better expression support such as
castanddownload(functions.file('...'))
cc @everettVT
I think at this point I've never built a daft.file from bytes. It certainly feels like a seperate workflow or primative altogether. Why not just leave it as a bytes type you know?
I'd be in full favor of this, perhaps it's worth mentioning in the slack group , or we can include it as a bullet point in an rfc for our incoming updates for daft.VideoFile and AudioFile.