feathers
feathers copied to clipboard
Scaling Docs
I already found a lot of hints about how to get a Feathers application scaled (e.g. in #157) but haven't had any luck yet getting it working with Socket.io (plain REST works smoothly). Am I right, that using feathers-sync
does not circumvent the necessity of Sticky Sessions (on the LB & the node processes themselves) to enable Socket.io's handshakes? My current use case is using pm2
to simply launch the application in cluster mode: e.g. pm2 start src/index.js -i 4 -f
.
What additional steps are required to get Feathers with Socket.io working "seamlessly" across several processes or physical resources? Is there a reliable solution to make Session-Handling more stateless using a Redis-Adapter for both express' and Socket.io's sessions?
feathers-sync
does not require sticky sessions. It assumes however that once a client has established a websocket connection it will stay connected to the same server for the entire connection. I am not aware of a load balancer that does otherwise and it was working quite well with hosting providers like Heroku and Modulus. Did you run into any problems setting feathers-sync
up as documented and start the cluster?
AFAICT sticky sessions (on the application level) are required to support long-polling connections and the general Socket.io handshake when running the application with Node.js' cluster features. If I got this right, feathers-sync
enables you to emit and receive events across several application instances for already connected clients. Nonetheless the handshake process of Socket.io seems not to be stateless and stores the handshake tokens / cookies in the local memory.
Thus running the application process-clustered (e.g. via pm2
) without a load balancer (with sticky sessions) leads to failed handshakes and a lack of long-polling support.
That can be solved with what is described how to use Socket.io in a Cluster though right? Feathers doesn't do anything that should interfere with that module.
Yes, @daffl. However, using sticky-session is not recommended as per https://github.com/primus/primus#can-i-use-cluster
I've ended up setting up a simple haproxy with sticky-sessions capabilities in front of my processes, and it is now working fine. Here is my simple config:
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
# http://blog.silverbucket.net/post/31927044856/3-ways-to-configure-haproxy-for-websockets
option redispatch
# very important if you have this server behind cloudflare like me! (check above link)
option http-server-close
# https://support.cloudflare.com/hc/en-us/articles/212794707-General-Best-Practices-for-Load-Balancing-with-CloudFlare
option http-keep-alive
# https://support.cloudflare.com/hc/en-us/articles/212794707-General-Best-Practices-for-Load-Balancing-with-CloudFlare
timeout http-keep-alive 300000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend private_api
bind *:3333
default_backend api
backend api
timeout server 600s
balance roundrobin
# http://blog.haproxy.com/2012/03/29/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/
# this is socket.io's specific cookie name
# double check it if your using any other transport/framework (primus, uws, ws, etc...)
cookie io prefix nocache
server server-0 127.0.0.1:3344 check cookie server-0
server server-1 127.0.0.1:3355 check cookie server-1
I've just tweaked the sample haproxy.cfg file to include what I needed. Mind the commented lines.
Also, if you're using pm2 to manage your app in cluster mode (making the most out of your machine's cpu cores), make sure you're now launching it in N (N = cpu cores) different processes, each with its respective port, and all in 'fork' mode instead of 'cluster'. This clustered mode was what got me into problems in the first place... As for the time being, pm2 does not allow sticky-sessions at its application level: https://github.com/Unitech/pm2/issues/389
Hoping someone finds this helpful...
Using the fork mode is not always desirable. If used properly, the cluster mode will create the same number of processes as CPU cores, thus maximizing the resources utilization. This does not happen in fork mode!
So, the cluster mode is the option to go for performance and the recommended configuration for production.
Please have a look at: https://github.com/Unitech/pm2/issues/1510
Some people are mentioning about changing the ports of each process created by pm2 cluster mode, which is wrong, we should not change the ports. pm2 is able to share the same port across multiple processes (thanks to Nodejs cluster mode).
I'm still having issues with pm2 cluster mode and feathers-sync.
@averri I have solved the PM2 + Feathers combination issue by disabling long-polling as a fallback when using socket.io.
Documented here: https://docs.feathersjs.com/cookbook/general/scaling.html#cluster-configuration
The issue is that even if you can somehow configure IP-based sticky sessions in the pm2 cluster, it would only work if you are directly receiving HTTP requests on your server.
As soon as you put your server behind a load balancer (e.g. in an autoscaling config) the IPs of all requests coming to your servers would have the load balancer's IP.
So the only solution for now is to disable long-polling as the feathers docs suggest.
Does a sticky session solution (let's say running in pm2 cluster mode) account for a single user being authenticated from multiple workstations? For instance a user logs in and is added to a channel users/{userId}
, but is there any sort of load balancing solution that could make sure users with that id are assigned to the same node (my assumption is this is not the case). Assuming not your events would only work accurately in the system you initially logged in with and may or may not work randomly in other systems depending on if you happened to be connected to the same node or not.
Assuming the above is all working as I presume, does the redis implementation of feathers-sync successfully address this concern?
TL;DR we run a very reactive app where there is an expectation that all changes are persisted to (n) environments in near realtime. We also run a high number of concurrent connections so the ability to scale the feathers server is highly desirable. Looking to test redis implementation soon.
UPDATE: Just implemented feathers sync on a single dual core debian instance. Running my pm2 cluster has 2 instances (1 per core) and I've verified multiple tabs of my site connect on each of the 2 running instances. Events are all working as I would expect on a single instance setup of feathers/socket.io. So far no gotchas and was very easy to setup with a local redis instance.