argh
argh copied to clipboard
Supporting non-UTF-8 paths
I love argh
for its simplicity, compile performance and leanness, and I find myself migrating more and more projects to it.
Just now it became evident that despite supporting PathBuf
in struct fields, it will always assume valid UTF-8 in the arguments it parses. This might be an assumption that holds in Fuchsia, but may make it impossible to use for general purpose linux tools that consume paths in arguments.
I wonder if there is any chance to allow parsing PathBuf
fields without making assumptions about their encoding?
Thank you
PS: This question was triggered by this post on reddit/r/rust.
Perhaps os_str_bytes
could be of use.
Thank you for the kind words!
Like Freaky said, you can likely use std::ffi::OsString
instead of a PathBuf
. Is that sufficient for your use case?
Fuchsia does indeed have everything as UTF-8, but I'm 100% willing to add more general support.
That sounds like a bug. PathBuf
is implemented as a newtype around OsString
, so why does using it detour through String
?
argh
is fundamentally &str
-driven: https://github.com/google/argh/blob/e59f2c8c7869619253902e0b42545787b3ac912a/src/lib.rs#L188
I mentioned os_str_bytes
because it looks useful for argument parsing, though I suspect the simplified approach argh
takes makes it less important.
Thanks so much, @benbrittain, for your openness towards this improvement! With that added correctness, I think this will put argh
right into the sweet spot between low-enough compile times, convenience, and low/no binary overhead.
Right now I am actually implementing a tool supporting both structopt
and argh
CLIs, and it's clear that the declaration of argh
is easier to remember and less cluttered, so that opinionatedness does hit a sweet spot for me indeed.
My
- work with
OsStr|OsString
instances internally, and usestd::env::args_os()
instead ofstd::env::args()
- when placing an
OsStr
into into a field that requires UTF-8 encoding, try to convert or fail gracefully - when wanting to call
FromArgValue
orfrom_str_fn(…)
, try to convert or fail gracefully - introduce a new
FromArgValueOS
orfrom_os_str_fn(…)
(working titles :D) for those who want to support non-UTF-8 fully
Is there anything fundamental I am missing? And how much effort do you think it is?
I am asking because maybe we sketch out the implementation a little and attract contributors :).
Hi @benbrittain, I think if I would hear your ideas about my initial thoughts stated above, I would consider giving the implementation a shot. That way I hope to avoid going off in the wrong direction.
Thanks a lot!
@Byron Sorry about the slow response! That sounds all sounds well thought out and I'm happy to work with you in moving argh in this direction! My only concern is with increasing binary overhead, so I want to ensure that we avoid that. I added #39 yesterday so that we actually track that
That sounds great, thank you! Regarding binary overhead, I would think that there is close to none, but then again I don't really know how Argh
is working. As long as the code generator won't have to support two versions of everything, i.e. OsStr
+ str
, I am optimistic supporting non-utf8 won't be detrimental.
Let's see how it goes :).