phoenix_client icon indicating copy to clipboard operation
phoenix_client copied to clipboard

Socket returned by start_link not connected

Open fxn opened this issue 4 years ago • 10 comments

The documentation in the README suggests that PhoenixClient.Socket.start_link/1 returns a connected socket that you can use right away to join a channel.

That does not seem to be the case, and I am using this quick and dirty helper which is enough for the development script I am writing:

defp wait_for_socket_available(socket) do
  if !PhoenixClient.Socket.connected?(socket) do
    Process.sleep(100)
    wait_for_socket_available(socket)
  end
end

Would it be possible to specify that you want to block until the socket is connected?

fxn avatar Apr 10 '20 16:04 fxn

I was having similar issue and found an example repo to see how it is being handled. In case you are interested check here: https://github.com/mobileoverlord/nerves_test_client/blob/master/lib/nerves_test_client/runner.ex#L41

README update would be helpful I guess.

rkenzhebekov avatar Apr 10 '20 16:04 rkenzhebekov

Does this API actually make sense? Is the main use case for connecting to a Phoenix WebSocket to get a socket back in an uncertain state? Or is the main use case to obtain a working socket to be used in the next line of code?

fxn avatar Apr 10 '20 18:04 fxn

If the current start_link/1 is useful for some use cases, you could have a convenience wrapper, for example

{:ok, connected_socket} = PhoenixClient.Socket.connect(..., timeout: ...)

and you get {:error, reason} if the connection is not possible.

fxn avatar Apr 10 '20 19:04 fxn

To throw in my 2¢, I would say the API makes sense as this is designed with OTP behaviors in mind and convenience working with phoenix socket semantics. Specifically, this is meant to help manage sockets and channels within supervision. Waiting for a socket state on start_link is a blocking call and if you're supplying a large timeout, that will block all the rest of your application from getting things going.

That said, your last idea of a connect() function would prob be a good use-case. I also had a call with Justin today about this lib and some design choices. Some of the ideas discussed were supplying a callback when starting the socket that can be notified on connected and disconnected which would help satisfy this condition as well. Prob needs a good run through the documentation also...

jjcarstens avatar Apr 19 '20 02:04 jjcarstens

Just got bitten by this as well. A note in the README that start_link doesn't guarantee a connected socket would be useful and save others time.

I'm starting the Socket under a Supervisor, then trying to use it in a GenServer before its connected and this leads to some (at least in my case) quite opaque and hard-to-debug errors.

th0mas avatar Jul 08 '20 13:07 th0mas

How did you guys get around this? Putting the phoenix.socket in my supervisor consistently gives me socket not connected errors?

Is there a better way?

raksonibs avatar Jul 16 '20 23:07 raksonibs

While I don't maintain this library, I figured that I'd respond since no one else has and these are good questions.

@th0mas I'm guessing that a simple one or two line PR to this project would be appreciated.

Network connections can't be guaranteed. There's too much outside of this library's control - Internet could be down, server could be down, etc. Or the network connection could be up and then go down sometime later.

What do you do about your GenServers that use this library then? The answer is that they they have to deal with the server being unreachable and implement a retry mechanism. Supervisors don't work for network connectivity issues. For one, they retry immediately without exponential backoff or cooldown, and they'll propagate failures upward in your supervision tree causing other pointless restarts. This is a lot of work for something that might resolve itself in minutes or hours.

Fred Hebert wrote a little about this at https://erlang-in-anger.com/. He talks about establishing connections to databases. See chapter 2.2 on supervision and guarantees.

I personally view this area as challenging since it would be nice to get more convenient guarantees out of Erlang and these libraries. People have strong opinions on this and it's application dependent, so I think that think that it would be easier to add in another less general-purpose library.

fhunleth avatar Jul 18 '20 17:07 fhunleth

@fhunleth all that makes sense, but the problem is that the most elemental API

{:ok, socket} = PhoenixClient.Socket.start_link(socket_opts)

does not give you a connected socket. Not because of network guarantees, but because it has not even attempted to connect.

So, we are talking about API design here, really. When I want to connect to MySQL, Kafka, Redis, etc., and I get a socket back, that is a connected socket mod networking is not reliable. If I open a file and get a file descriptor, perhaps a different process deletes the file in parallel. Yes, you do not control the environment in almost anything, but APIs should have normal ergonomics for the golden path.

To me, the fact that any user has to write code like this for the most basic thing you do with this library:

defp connect(driver_id) do
  socket_opts = [url: @socket_url, params: %{"jwt" => JWT.new(driver_id)}]
  {:ok, socket} = PhoenixClient.Socket.start_link(socket_opts)
  wait_until_connected(socket)
  socket
end

defp wait_until_connected(socket) do
  if !PhoenixClient.Socket.connected?(socket) do
    Process.sleep(100)
    wait_until_connected(socket)
  end
end

suggests there's a missing convenience layer, here. I should be able to say: Give me a connected socket, blocking, or err.

fxn avatar Jul 18 '20 17:07 fxn

I think we're talking about different use case. I haven't seen code like your example in any of the uses of this library. The use cases that I'm aware of are long lived connections. Since the connection can go up and down all the time, the GenServer that uses the library has to deal with it. Blocking until there's a network connection doesn't fix that. Also, starting disconnected is the same as starting connected. Arguably it's better to start disconnected so that that condition is handled.

Your use case looks synchronous. I agree that would be very convenient to have a synchronous API for short interactions. Honestly, there's so little code here, how about publishing a synchronous Phoenix channel library? You could delete quite a bit of code too.

fhunleth avatar Jul 18 '20 19:07 fhunleth

I haven't seen code like your example in any of the uses of this library.

My use case is a script that simulates users of my application. I create hundreds of sockets, and they interact with the system.

If you read the description of the issue, I am not asking to convert this into a blocking call. I am asking to have a way to connect using a blocking call. The function returns when it has been able to establish the connection. No networking library in any programming language can guarantee to you that the connection will be open 5 seconds later, or even between the connection inside the function, and the caller getting the return value for that matter. That is part of the rules of the game.

However, to reconnect, you need to have a way to verify the connection, don't you. Reconnection normally means I try, and I have a way to verify it succeeded before giving up! A promise, like what you get today, does not help there. You need to block yourself.

Look at redix. That is the usage I expect. You get a connection back that you can use to perform calls right away. And reconnects as needed, as Phoenix clients are expected to do.

fxn avatar Jul 18 '20 19:07 fxn