rust-tokenizers
rust-tokenizers copied to clipboard
Use AsRef<Path> instead of &str.
Master PR: #76 Blocked by: #79
Motivation
- Not every FS path is a valid utf-8 string.
-
AsRef<Path>
is strictly more acceptable with same level of correctness: you can use&str
,Path
andPathBuf
.
Implementation
- [x] Use
AsRef<Path>
instead of&str
. - [x] Remove unnecessary type casts and unwraps from tests.
@veta666, this one is also on you.
Thank you @npatsakula for this PR.
I would be in favor of actually moving from
&str
toPath
instead ofAsRef<Path>
for 2 reasons:1. As you mention, not every filesystem path is a valid utf-8 string. If we decide to take the _path_ of correctness, it makes sense to expect a `Path`. The user can still convert the `&str` it may use to `Path` before passing it to the tokenizers and vocabs. 2. Using `Option` with generics with `AsRef` leads to awkward `None` arguments, requiring the user to pass `None::<Path>` instead of simply `None`. I feel this is a high price to pay for allowing (possibly incorrects) `&str` as path representation.
What do you think? The next few pull requests will introduce breaking changes anyway, so I think using
Path
makes sense here.I believe this would be ready to merge before #79 (I am not fully bought in the idea of
snafu
) so it may make sense to merge this one first and revert the error handling commits on this branch.
- Yes, but
&str
is valid subset of filesystem paths (the reverse is not true). This mean that we obtain richer interface for no cost. - We can use pre-turbofished interface for beginner rust programmers (link).
Now I replaced all AsRef<Path>
with Path
and broke all comment documentation + you can see how noisy it looks like.
Hello @npatsakula ,
You are right I got confused into assuming that Path
was a subset of &str
, it does make sense to keep a richer interface. As the library now leverages the specialized new_with_special_token_mapping_path
, the issue with the awkward None
turbofish becomes even less of an issue.
I apologize - but you convinced me that your original choice was, indeed, the right one. I would be happy to go with AsRef<Path>
(after causing some extra work on your end). If you're happy to do so I'd go back to your original implementation with AsRef<Path>
but without the error handling rework for now.
I'm happy to help our revert the changes if you'd like me to do so, please let me know.
Hello @guillaume-be! I reverted to AsRef<Path>
(doc-comments too).
Hello @guillaume-be! I reverted to
AsRef<Path>
(doc-comments too).
Thanks a lot! looks good to merge - sorry again for the unneeded iterations