axel Add early redirect loop detection

Add early redirect loop detection

Open ismaell opened this issue 2 years ago • 17 comments

It might be useful to detect a redirect loop, not just guess based on the number of tries, but actually check the URL, to allow to fail early and a larger default number.

Feb 27 '22 01:02 ismaell

Hey, I'm new to opensource and would like to contribute to this. Can you please elaborate on what needs to be done? Thanks.

Jun 30 '22 11:06 acharyasourav

Hi @Sugarlust, thanks for the interest.

All the code related to connection handling is within src/conn.c.

Since HTTP itself is stateless, any repeated request is enough to detect a loop, and since axel isn't very smart, for starters just keeping track of the URLs is enough. An O(N) implementation is fine.

Jun 30 '22 22:06 ismaell

Check the first do-while loop in the conn_info function.

Jun 30 '22 22:06 ismaell

Thanks @ismaell I'm gonna look for that and reach you out if I need some help.

Jul 01 '22 10:07 acharyasourav

hey @ismaell I skimmed through the source code and it's pretty confusing for me but I'm surely trying. Can you please tell me how can one keep track of URLs? According to my understanding if the same URL repeats then it should be declared redirected right? Then How should I check the URL and what needs to happen next?

Jul 01 '22 12:07 acharyasourav

If a URL is repeated, then it's a closed loop, and axel can abort early, without reaching max_redirect.

Here's the code you need to deal with: https://github.com/axel-download-accelerator/axel/blob/3af1542e8143d8fa2f50eecf721a9fdbe23ef128/src/conn.c#L401-L435

Everything you need to know is in there.

Jul 01 '22 22:07 ismaell

Hi, @ismaell! Looks like this issue still hasn't been resolved. I'm also new to open source and would like to give it a try.

Since you're okay with an O(n) implementation, could I simply store the URLs in an array and then do a linear scan to check whether the current URL has already been visited?

Is there a naming convention you'd like me to follow when creating a new branch?

Sep 06 '22 20:09 cm-jones

Will this do the trick?

https://github.com/codymichaeljones/axel/commit/dfb2f7c11d303f892058c241322240a905e86b42

Sep 06 '22 21:09 cm-jones

It's in the right path!

Changes needed:

Don't remove the original max_redirect check. It needs to be an additional restriction, because maybe there's no explicit loop yet but it's still too many redirects, we want to handle both.
you can't use conn->http->headers to store the URLs (maybe use malloc(MAX_STRING) for each, worry not about the memory usage, we can improve that later)
the message should be different, something like "Redirect loop detected"...
check your indentation

Sep 09 '22 13:09 ismaell

Is there a naming convention you'd like me to follow when creating a new branch?

We don't have any, follow your heart. Normally I don't do merge commits, so there's no track of that.

The only hard rule is that your PR must apply cleanly to the development branch, but since it's a slow-moving small project, that's easy.

Sep 09 '22 13:09 ismaell

I'm not sure I fully understand this point. I was using the conn_url() function to get the current URL as a string before I store it inside the urls[] array. How else can I get the current URL? I think I understand your point about malloc'ing enough memory to hold each URL.

Sep 14 '22 18:09 cm-jones

I'm not sure I fully understand this point. I was using the conn_url() function to get the current URL as a string before I store it inside the urls[] array. How else can I get the current URL? I think I understand your point about malloc'ing enough memory to hold each URL.

Using conn_url is correct, but it takes a buffer (1st and 2nd argument) and copies the string there, you need to allocate the buffer, you can't use conn->http->headers.

Sep 14 '22 18:09 ismaell

I see. Thanks for clarifying.

https://github.com/codymichaeljones/axel/commit/fb764c3b07ea1c0529db3c6179dc444f1209c5cd

This is my attempt to fix everything you mentioned other than the indentation. I don't see where my indentation is wrong. Could you be more specific, please?

Sep 14 '22 18:09 cm-jones

I left some comments there. With respect to the indentation, the project uses tabs (8-char wide).

Sep 14 '22 20:09 ismaell

Thanks for all your comments and for walking me through this. I believe the indentation should be fixed now as well as the other issues you commented on in my last commit. How are things looking now?

https://github.com/codymichaeljones/axel/commit/d5c772a9b5e84555308abbd37aa31e3d65f9ea98

Also, is it necessary for me to link to my commits in these comments or is there a better way to demonstrate my progress to you?

Sep 14 '22 23:09 cm-jones

It looks good. I will give it a second look in the afternoon. Kudos!

Sep 15 '22 11:09 ismaell

Hi, @ismaell! Should I go ahead and create a PR or wait for you to review my latest commit more carefully?

Sep 20 '22 18:09 cm-jones

@codymichaeljones yes, create the PR, please.

Sep 22 '22 15:09 ismaell

Excellent work @codymichaeljones, please check out the changes I made (minor). If you're inclined to do so, it can be further improved. E.g. memory usage could be improved.

Ideas:

Preallocate a single buffer for all strings to improve memory usage and behaviour.
Maybe just save hashes of the URLs instead of the strings for huge savings.

Sep 27 '22 21:09 ismaell

Thanks for your patience and for walking me through my first open source contribution. I had fun and would love to keep contributing to this project.

If I were to begin working on the ideas you mentioned, how should I incorporate the changes you've made in the upstream repo in my forked repo? Also, should I continue working in the same branch?

Sep 27 '22 23:09 cm-jones

Well, you're using Git in a sensible way AFAICT. Assuming a workflow using only topic branches, a few tricks you can use to quickly sync without switching branches:

git fetch will retrieve remote branches
git push . origin/BRANCH:BRANCH will sync a local branch with it's remote (fast-forward)

BTW, fast-forwarding means moving the tip of a branch forward, adding new commits. It implies these new commits are already connected to the existing commits, so no merging operation is needed.

And when necessary you can rebase your topic branches:

git rebase origin/master BRANCH to rebase the work on your topic branches

A topic branch merged to master can be just deleted, but git may complain, like in this case, if it can't match tip of the branch you're trying to delete to a commit in another branch (because if it isn't merged the change will be lost), but if you are sure, you can just force the deletion.

Sep 28 '22 13:09 ismaell

In this specific case, since all your changes were merged but in a modified way, you can just reset (discard changes) your branch to the contents of master and continue to work there with: git reset --hard origin/master

Sep 28 '22 13:09 ismaell

However, in the case your changes were partially merged, or you have new changes already in your branch, you can do an interactive rebase to edit your branch and continue the work from there with: git rebase -i origin/master

Sep 28 '22 13:09 ismaell

Interactive rebase is one of the most useful tools.

Sep 28 '22 13:09 ismaell

Since the basic functionality is merged, I'm going to close this ticket, and the improvements can refer to (the newly created) #392.

Oct 28 '22 21:10 ismaell

axel axel copied to clipboard

Add early redirect loop detection

axel
axel copied to clipboard