axel
axel copied to clipboard
netrc: Implement .netrc parsing
This implementation parses the $HOME/.netrc (or a file specified by the
user) through the --netrc[=file]
or -R[file]
options.
It uses a simple FSM in order to parse the file and find the credentials corresponding to the FTP host, or uses the "default" entry otherwise, if any, according to the specification found on [1]. Tokens not applicable to axel usage were not considered.
[1] https://docs.oracle.com/cd/E19455-01/806-0633/6j9vn6q5f/index.html
Closes: #116
Signed-off-by: David Polverari [email protected]
PS: On this implementation, I had to use the // fall through
comment in order to placate GCC's -Wimplicit-fallthrough
, as the project Makefile WARN_CFLAGS are considering a lot of warnings as errors and in this case it was the desired outcome.
As GCC considers this specific comment as an "explicit fallthrough", the error doesn't show, but I don't know whether other compilers (like clang) will present the same behavior.
Thanks for the review! I will try to address the points you made as soon as I can.
* have you thought about using `mmap`? that could simplify the design.
About this point, I was thinking about using mmap
, but using it precludes the use of strtok*
, because mmap
won't null-terminate the mmaped file and strtok
{,_r} will try to change the string, something unacceptable for a mmaped file, and something that will not work on read-only mappings, which would be the right way to mmap .netrc
anyway.
I could either malloc
(file_size + spare bytes), load file contents on this mem and null-terminate it, in order to work on that copy with strtok*, but in the unlikely event that someone has an absurdly big .netrc file, this approach might fail. Performance-wise, I don't think it would be a burden in most cases.
Or maybe I can just write a tokenizer function by iterating through the mmaped bytes and filling a char []
with each found token, while keeping track of the beginning of the next token by means of local static pointer.
For me, the current fgets
approach and the malloc
one seem the most KISS. What are your views about it?
Edit: I'm implemented a custom tokenizer to use along with mmap. I will rewrite the PR and will send as soon as I can.
On 18/Oct/2019 20:51, davidpolverari wrote: <...>
Or maybe I can just write a tokenizer function by iterating through the mmaped bytes and filling a
char []
with each found token, while keeping track of the beginning of the next token by means of local static pointer.For me, the current
fgets
approach and themalloc
one seem the most KISS. What are your views about it?
It would be worth knowing how it compares in practice, I suspect it's less overhead in the end, but for now the stdio API is fine.
It doesn't need to copy anything, and since the format is so simple, you can get away with something as simple as strspn + strncmp, altough we would need a limiting version of strspn, but that's about it.
I sent a new commit for the current PR. As soon as the code is ready for merging, I'll rebase everything.
I already addressed most of the points you made on my local repo. As soon as I implement the remaining ones (netrc_t
, strspn
-like interace, and --no-netrc
) I will send the PR.
On 22/Oct/2019 18:15, davidpolverari wrote:
I was afraid of using strlcpy because it is not standard (neither ISO C nor POSIX), and thus could cause portability problems. But if it is ok for you, I'll use it.
We provide strlcpy and strlcat for systems that don't have it on libc and don't have libbsd.
It's OK to use non-standard functions if they're generic.
I didn't forget about this issue. I was busy because of work, but will resume as soon as possible.
I just sent another WIP commit with most of the proposed changes, including using strspn
and strcspn
-like functions (which I named memspn
and memcspn
as they don't deal with null-terminated strings).
Later on I will try to refactor next_token()
into something like a memtok_r()
, and will try to get rid of the global variables by explicitly passing them as arguments to the auxiliary static functions, besides implementing the remaining proposed changes.
I believe that with those latest PRs most of the requested things were changed. I have doubts now as to where we should keep the netrc_t
pointer, at what point we should init the structure, etc. I think it would be easier if we could decouple the username and password from conn_t
, and had a struct used globally for it, but it would imply touching on a lot of places.
After that, I will be able to implement the negative (--no-netrc
) option.
Well, with this last PR I think the changes are almost all done, the only missing piece being the negative option (and removing the debugging printf
I put). There are other things that I thought of doing at a later point in time, as moving the memspn
, memcspn
and memtok
functions elsewhere on axel code, but I think that can be postponed.
Any thoughts about it? When everything is on right shape I can rebase it on master and push it again in a more organized way.
Looks much better now. I made a couple of comments, the only important task now is to complete the get_creds implementation.
What else do you think it is missing on get_creds()
implementation?
@davidpolverari please take a look at https://github.com/ismaell/axel/commit/888ec1e6f8356d7ac3a768be77089a72bad12b7b.
That fixes both the null deref at the mmap function, the parsing (which now aborts on unkonwn tokens), and removes two globals.
What I'm missing there is a cache, so that each machine is only parsed once.
Your parser implementation is way more elegant than mine :smile:. Anyways, I had to make minor changes to make it work and to simplify (IMHO) the program flow. Besides, we can't use memcpy
because the pointers returned by memtok
aren't NULL-terminated, thus we need strlcpy
to NULL-terminate it the copy for us.
As the parser now aborts on unknown tokens, we may need to include the other tokens used by the format which are unused on axel, and turn them in no-ops, otherwise a compliant file might get ignored.
Regarding the caching (memoization), I think we could create a function to wrap netrc_parse
and populate/verify/return the previous results contained in an array of few elements of a struct (host, username, password) inside netrc_t
, acting like a circular buffer. With few elements, we wouldn't need any fancy algorithm to do it.
But don't you think it is better to implement it as a separate PR later?
I failed to mention I made changes at https://github.com/ismaell/axel netrc branch
I cherry-picked your commits here, as they would be the same changes I'd have to make.
@davidpolverari hey, 'sup? are you still around?
Hi! Yes, was just a little bit busy with other stuff.