ibrowse
ibrowse copied to clipboard
set max sessions across subdomains
Hi, Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:
ibrowse:set_max_sessions("hotmail.com", 443, 100)
I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)
Is this possible today?
Hi Varnit
Unfortunately no, but is easy enough to do. Will look into it this weekend as I will be refactoring ibrowse a bit to integrate other pull requests.
W: http://chandrusoft.wordpress.com
On 17 Dec 2014, at 23:09, Varnit [email protected] wrote:
Hi, Is it possible to set the max sessions for a domain and have it work across all its subdomains? For example if I set the following:
ibrowse:set_max_sessions("hotmail.com", 443, 100) I would want a maximum of 100 connections for hotmail.com and all its subdomains (m.hotmail.com, bay01.hotmaill.com etc)
Is this possible today?
— Reply to this email directly or view it on GitHub.
OK, thanks! Let me know if you need help with anything.
@cmullaparthi I assume this has not been done in that weekend?
I'm afraid not :-) I take it this is important for you?
@cmullaparthi Kind of important, forgive my poor English, let me tell a story.
I got a bunch of urls from my boss like this:
http://www.example0.com/foo/bar http://test.example0.com/foo/bar http://foo.example0.com/foo/bar http://bar.example0.com/foo/bar http://www.example1.com/foo/bar http://test.example2.com/foo/bar http://foo.example3.com/foo/bar http://bar.example1.com/foo/bar ...
Then I got a configuration file from my boss like this:
example0.com --> concurrent: 1 bar.example1.com --> concurrent: 2 bar.example2.com --> concurrent: 3
Then when I request the urls above, I need to limit their concurrency by the configuration above.
And the configuration file, in my boss's opinion:
example0.com
ofcouse means *.example0.com
and example.com
.
And I can't tell my boss that ibrowse does not have that kind of configuration, so I have to handle this in my application.
And the other thing is that, the urls my boss give me, is dynamic changing. So I can't tell my boss:"Give me all your urls, and let me generate a appropriate configuration file for you.", I think my boss will reply:"No, programmer, I won't, I'll add url to the list whenever I want, this is easy, handle it".
So, when the my program has been start running, my boss may come to my desk and give me another url, say:"Add it to the list", then I will do as my boss just said.
For now, here is my solution:
- I got a url
http://test.example0.com/foo/bar
, need to be handled - I got a host from the url
test.example0.com
- I match the host
test.example0.com
within the configuration file, useends_with
- I matched
example0.com --> concurrent: 1
- I call
:ibrowse.set_max_sessions("test.example0.com", 80, 1)
- I think it's done
Then if I got any url like:
http://test.example0.com/foo/bar1 http://test.example0.com/foo/bar2 http://test.example0.com/foo/bar3 http://test.example0.com/foo/bar4
the steps above will be processed again, cause I am so lazy and I didn't write code to store the configurations and then check if the domain is configurated.
Well, end of story.
I not quite sure if it is the right solution, but it seems working.
BUT: I would love to remove the code I have wrote to match subdomains immediately, if ibrowse have this feature.
I loved this story :-)
There are a couple of complications with this:
- One or more of your subdomains may be unreachable because there are lots of requests to another subdomain
- Load balancing will be a more expensive operation because it has to make sure that the limit is enforced while routing requests correctly to each subdomain.
Are you happy with both these limitations? If so I will go ahead and implement it.
@cmullaparthi Thanks for your reply ~
One or more of your subdomains may be unreachable because there are lots of requests to another subdomain
- If the
unreachable
is caused because of the server bandwidth or capability, then it's fine. Since we limit the max_session on the root domain for a reason. - If the
unreachable
is caused because of theretry_later
message fromibrowse
, then it is also reasonable, it is exactly what we want.
Load balancing will be a more expensive operation because it has to make sure that the limit is enforced while routing requests correctly to each subdomain.
Expensive is a relative word.
Yesterday I refactored my code for better limitation feature, I use poolboy to set a ibrowse pool for every root domain, every time when I get a url, I check if the pool of the root domain of this url exists, if it exists, use the pool, otherwise create a new pool for this root domain.
If what you are going to implement is not more expensive than my approach, I think it worth a try.
Thank you.
Okay, good. No, the solution will be cheaper than using an external pooling mechanism. I'll create a branch with the proposed changes so you can try.
@cmullaparthi Thanks, you are so nice!
I've pushed some changes to the issue_124 branch. See 3fc7e78aad6ab4b882da4268d17871d1fbc1cc5f
Usage:
$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.3 (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}
2>
f(),
ibrowse:set_max_sessions("google.com", 80, 1), %% Set the LB config for the root domain
Res_1 = ibrowse:send_req("http://www.google.com", [], get, [],
[{use_subdomain_lb_config, {"google.com", 80}}]), %% New option
io:format("Res_1: ~p~n", [Res_1]),
ibrowse:show_dest_status(),
Res_2 = ibrowse:send_req("http://m.google.com", [], get, [],
[{use_subdomain_lb_config, {"google.com", 80}}]), %% New option
io:format("Res_2: ~p~n", [Res_2]),
ibrowse:show_dest_status().
- Result of the first request - succeeds as expected
Res_1: {ok,"302",
[{"Cache-Control","private"},
{"Content-Type","text/html; charset=UTF-8"},
{"Location",
"http://www.google.co.uk/?gfe_rd=cr&ei=GBZpV-W9IYHS8AeEya-oAg"},
{"Content-Length","261"},
{"Date","Tue, 21 Jun 2016 10:25:28 GMT"}],
"<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&ei=GBZpV-W9IYHS8AeEya-oAg\">here</A>.\r\n</BODY></HTML>\r\n"}
- Internal ibrowse LB status. 1 connection to www.google.com and the same load balancer PID for all subdomains.
Server:port | ETS | Num conns | LB Pid
================================================================================
www.google.com:80 | 20500 | 1 | <0.41.0>
google.com:80 | 16403 | 0 | <0.41.0>
- Result of second request. Fails because we set ''max_sessions'' to 1, and that is taken up by a connection to www.google.com, and this request which is to 'm.google.com' fails
Res_2: {error,retry_later}
- And the internal LB status. 1 connection to www.google.com and the same load balancer PID for all subdomains.
Server:port | ETS | Num conns | LB Pid
================================================================================
www.google.com:80 | 20500 | 1 | <0.41.0>
google.com:80 | 16403 | 0 | <0.41.0>
m.google.com:80 | 32791 | 0 | <0.41.0>
The same test succeeds if you set max_sessions to 2.
$ erl -pa ebin
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.3 (abort with ^G)
1> application:ensure_all_started(ibrowse).
{ok,[ibrowse]}
2>
f(),
ibrowse:set_max_sessions("google.com", 80, 2),
Res_1 = ibrowse:send_req("http://www.google.com", [], get, [],
[{use_subdomain_lb_config, {"google.com", 80}}]), %% New option
io:format("Res_1: ~p~n", [Res_1]),
ibrowse:show_dest_status(),
Res_2 = ibrowse:send_req("http://m.google.com", [], get, [],
[{use_subdomain_lb_config, {"google.com", 80}}]), %% New option
io:format("Res_2: ~p~n", [Res_2]),
ibrowse:show_dest_status().
Res_1: {ok,"302",
[{"Cache-Control","private"},
{"Content-Type","text/html; charset=UTF-8"},
{"Location",
"http://www.google.co.uk/?gfe_rd=cr&ei=dBlpV-mXDpPS8AfI1IFY"},
{"Content-Length","259"},
{"Date","Tue, 21 Jun 2016 10:39:48 GMT"}],
"<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.co.uk/?gfe_rd=cr&ei=dBlpV-mXDpPS8AfI1IFY\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port | ETS | Num conns | LB Pid
================================================================================
www.google.com:80 | 20500 | 1 | <0.41.0>
google.com:80 | 16403 | 0 | <0.41.0>
Res_2: {ok,"302",
[{"Location","http://www.google.com/mobile/other/"},
{"Cache-Control","private"},
{"Content-Type","text/html; charset=UTF-8"},
{"X-Content-Type-Options","nosniff"},
{"Date","Tue, 21 Jun 2016 10:39:48 GMT"},
{"Server","sffe"},
{"Content-Length","232"},
{"X-XSS-Protection","1; mode=block"}],
"<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>302 Moved</TITLE></HEAD><BODY>\n<H1>302 Moved</H1>\nThe document has moved\n<A HREF=\"http://www.google.com/mobile/other/\">here</A>.\r\n</BODY></HTML>\r\n"}
Server:port | ETS | Num conns | LB Pid
================================================================================
www.google.com:80 | 20500 | 1 | <0.41.0>
google.com:80 | 16403 | 0 | <0.41.0>
m.google.com:80 | 32791 | 1 | <0.41.0>
@cmullaparthi Awesome! Trying...
When I use this feature, it seems... well, a little tricky?
- Got a limitation like this:
"example.com" -> 2
- Received a url like this:
http://test.example.com
- Got the root domain of
http://test.example.com
, which isexample.com
- Send the request, with option
ibrowse:send_req("http://test.example.com", [], get, [],
[{use_subdomain_lb_config, {"example.com", 80}}])
Suddenly I realized something, my boss said:"The server example.com is weak, we won't send more than 2 requests at the same time".
When my boss was saying this, the meaning seems include: "I don't know what the port mean, and I don't care what the 443 or 80 or even 8080 mean, they are just webpages, go get them, less than 2 requests at the same time".
At this time, I think maybe it's better to accomplish these demands in my application, instead of ibrowse, what do you think? @cmullaparthi
Yeah, it's not particularly elegant. But I feel that is the nature of the problem. If you always know that you are going to always shape traffic by using the 1st level subdomain, your code, I suppose, could be simpler using this feature?
invoke_ibrowse(Url, Headers, Payload, Method, Options) ->
#url{host = Host, port = Port} = ibrowse_lib:parse_url(Url),
Host_tokens = string:tokens(Host, "."),
LB_shaping_domain = string:join(lists:nthtail(length(Host_tokens) - 2, Host_tokens, "."),
ibrowse:send_req(Url, Headers, Method, Payload, [{use_subdomain_lb_config, {LB_shaping_domain, Port}} | Options]).
I suppose the above is more bearable than having to maintain your own pooling mechanism?