cpp20-http-client How to follow URLs when using send

How to follow URLs when using send_async?

Open melroy89 opened this issue 7 months ago • 9 comments

Currently my implementation is following your example code.

But now the question becomes, how to keep the redirect working and using your send_async call? So using send_async<256>(); etc. And at the same time it should keep following those 30x status responses.

Using:

while (true) {
    // Do request...
    std::string url("<some initial URL..>");
    const auto response = http_client::get(url).sent();

    // Follow 30x status codes
    if (response.get_status_code() == http_client::StatusCode::MovedPermanently ||
        response.get_status_code() == http_client::StatusCode::Found) 
    {
        if (auto const new_url = response.get_header_value("location")) {
            url = *new_url; // Override URL
            continue;
        }
    } else { ... }
    break;
}

Dec 06 '23 16:12 melroy89

Hmm, what exactly do you mean with following 30x status codes? Are you making a request to a website that redirects to another URL that redirects to another URL etc, thirty times? At what point do you want to do the SEND request?

Dec 06 '23 18:12 avocadoboi

Sorry if I wasn't clear. 30x is the status codes 301, 302, 303, etc (where the x represent a number).

So a stated in your own code and documentation. But it's true in in general; you can often have websites that not return 200 OK status code directly, but in some cases (due to various reasons, incl. TLS) return 301 or 302 instead. In those cases you want to follow the URL to the actual URL, like in the example I posted above. After all a 301 or 302 indicates a redirect, and not the final URL.

As you can see in your own example you loop over this code and only break out the while loop when you do not have a 301 or 302 anymore basically. This works great.. but my question was: Now I want to have the same functionality but with send_async.

Dec 06 '23 20:12 melroy89

Ahh got it, well here is an example of how you could do it:

  auto response_future = http_client::get("youtube.com").send_async();

  while (true) {
      std::cout << "(Waiting for response...)\n";

      if (response_future.wait_for(20ms) == std::future_status::ready) 
      {
          auto const response = response_future.get();
          if (response.get_status_code() == http_client::StatusCode::MovedPermanently ||
              response.get_status_code() == http_client::StatusCode::Found)
          {
              if (auto const new_url = response.get_header_value("location")) {
                  std::cout << "Redirecting to " << *new_url << '\n';
                  response_future = http_client::get(*new_url).send_async();
                  continue;
              }
              else {
                  std::cout << "Got MovedPermanently or Found but no new URL, exiting.\n";
              }
          }
          // Other status codes...
          else {
              std::cout << "Response received!\n";
          }
          break;
      }
  }

In this example I used wait_for only because the loop does nothing else that takes time. Is that what you were looking for?

Dec 06 '23 20:12 avocadoboi

Yes... Indeed.. that makes a lot of sense! Maybe add this to your examples.

Now my last question, this was my whole point actually is to use this async code and execute multiple async calls (eg. 10 times calling youtube.com in your example with currency using async). I guess that is the whole idea after all with async / futures, so I would like to keep this redirect request feature together with async that you currently showed above (which is great!), however now I like to combine this with multiple calls in concurrency without using C++ threads thus just using a single process.

I expect this code flow (correct me if I'm wrong OR if you have a better idea/solution):

Prepare/calling the GET requests 10x youtube.com (or another URL, YouTube was just an example)
- Of course in a non-blocking way, we are trying to use async here
- Idea: Maybe we need to prepare the calls first and then call an async method to process all the requests in a concurrent way.
Wait until all those calls are all finished in execution (if needed execute an additional request for this 301 and 302 redirects to keep working)
Processing all requests when all requests are finished. Iterate over each request and retrieve some information: eg. printing response.get_body_string() for each request.

I think this is a very good use-case how you can use async in practice. If you got a good answer maybe consider adding this to your examples. Since I think the current examples doesn't really leverage the real benefit of async (vs non-async), when you're just executing a single request.

Dec 07 '23 00:12 melroy89

~~OK, I ditched the while loop~~ I think it's a very good practice to add response_future.wait_for(20ms) == std::future_status::ready back again to avoid undefined behaviour?. I came up with this (looks good, or can it be improved? Granted this can only support 1 redirect):

std::vector<std::future<Response>> futures;
futures.reserve(repeat_requests_count);

// Loop over the nr of repeats
for (int i = 0; i < repeat_requests_count; ++i) {
    std::future<Response> response_future;
    response_future = get(url)
         .send_async();
    // Push the future into the vector store
    futures.emplace_back(std::move(response_future));
}

for (auto& future : futures) {
    while (future.wait_for(10ms) != std::future_status::ready) {
      // Wait for the response to become ready
    }

    try {
        auto response = future.get();

        if (response.get_status_code() == StatusCode::MovedPermanently ||
            response.get_status_code() == StatusCode::Found) {
            if (auto const new_url = response.get_header_value("location")) {
                auto new_response_future = get(*new_url)
                    .add_header({.name="User-Agent", .value="RamBam/1.0"})
                    .send_async();
                auto const new_response = new_response_future.get();
                process_request(new_response);
            } else {
                std::cerr << "Error: Got 301 or 302, but no new URL." << std::endl;
                process_request(response);
            }
        } else {
            process_request(response);
        }
    } catch (const std::exception& e) {
        std::cerr << "Error: Unable to fetch URL with error:" << e.what() << std::endl;
    }
}

Ps. I don't like the wait in the for loop.. since I don't know which response is coming back first but now I implemented a busy wait until the first (second, third, etc.) response is ready.

Note: process_request() is a method that processes the request further.

Dec 07 '23 02:12 melroy89

Hm, I don't think there is any good solution to this without using threads unless the work that you are doing in the main program while fetching the content is iterative, or the same thing over and over. Ideally (in the general case, it depends on the use case as I mentioned) you would want something to run in the background that does the redirects for you while you are doing some work in the main thread. In the code you suggested, you let the first request run asynchronously but then wait for the redirects immediately after sending them, effectively doing synchronous redirect requests afterwards. If the initial request finished before that point, you would be wasting time.

I actually think I want to implement automatic redirects in the library (as an option) since it is probably the most commonly wanted behaviour. Then you would just need to do something like get("youtube.com").handle_redirects().send_async() and use it like any other asynchronous request.

Sorry for the late response, university is quite distracting.

Dec 12 '23 08:12 avocadoboi

I actually think I want to implement automatic redirects in the library (as an option) since it is probably the most commonly wanted behaviour.

YES please! That would help me a lot. Libcurl also implemented this is their library (see: CURLOPT_FOLLOWLOCATION), you basically want to do the same here. Instead of handle_redirects() you could also add just an argument to the get() method (and other methods, post, put, delete, ..)? Like: get(..., follow_redirect=true..), you get the idea hopefully. But I leave the design solution to you.

EDIT: You are correct, my implementation is blocking now when a redirect is taking place. In fact, is really bad... lolz

Dec 12 '23 17:12 melroy89

I will try to implement it then :). Sadly C++ has no support for named arguments, so I try to avoid boolean parameters where the meaning is not obvious from reading the function call. However follow_redirects() is a better name. The value returned from get() is a "builder" with default parameters which you can modify through (chained) member function calls before sending the request. I thought it would fit in with that pattern.

Dec 12 '23 21:12 avocadoboi

Yea too bad about the named arguments indeed, I was more hinting on default value, but I understand. I agree with follow_redirects() name as well.

Dec 12 '23 21:12 melroy89

cpp20-http-client cpp20-http-client copied to clipboard

How to follow URLs when using send_async?

cpp20-http-client
cpp20-http-client copied to clipboard