judge0 icon indicating copy to clipboard operation
judge0 copied to clipboard

Can "worker" can be scaled-up & down realtime?

Open ryankwondev opened this issue 3 years ago • 1 comments

Hi,

Thanks for develop and maintaining this awesome project.

I'm currently trying to make judge handle massive amount of submission, so trying multiple way to scale this up.

I've checked #116, #118, #226 and #211, also Robust and Scalable Online Code Execution System in IEEE.

This problem may have already answered, I'm sorry to ask you again because I can't guarantee that it's possible to do what I'm trying to do.

  • I understood that the count variable inside judge0.conf cannot be changed dynamically without rebooting. Is that right ?
  • Without using docker-compose, I'm trying to run redis, (api)server, and workers container on different cloud instances, and scale the instances where the workers container depends on request. Given that adjusting the number of ‘workers’ containers in #221 using the --scale option of docker-compsose, it seems possible to increase the number of instances in which the ‘worker’ container is running without interruption. But even if it's scale-down, can the server node (container) can detect it and allocate the job properly? To find out this problem, I looked at what items are stored on redis based on the connection status of containers in real time through RedisInsight, and it seems that the information on the worker node is stored in redis and the information doesn’t seem to disappear in redis even if the node is turned off, so I asked you a question.

Sorry to bothering you.

Have a good day!

ryankwondev avatar Nov 24 '22 12:11 ryankwondev

As the author has not responded yet, can you please do a small test, where you have 2 worker instances and one API instance, now you kill one machine while bombarding the API with submissions. Give appropriate timeout. This way, we can conclude if there is a reque logic inside the API and redis image or it has to be added as another layer, which will handle requeue in case the submission has not received a result in x number for seconds, let us take that 10 seconds. I also plan to try this, but my plate is full of tasks right now. Do inform us if you are able to test this out.

Kick933 avatar Feb 26 '24 06:02 Kick933