There is already good documentation from Docker on swarm – https://docs.docker.com/engine/swarm/
In simple terms – Docker swarm is an orchestration tool to manage docker containers running on nodes ( physical or virtual servers ). We mark nodes as two types – Manager node and worker node. Manager node which does cluster management and hand over the tasks to worker nodes. Worker nodes run the tasks and report back to manager node. By default manager node also run services as worker nodes but it can be configurable if you don’t want to.
Lets put the theory into practice now.. I have three AWS EC2 instances
172-31-35-29 - Manager node 172-31-4-29 - worker node 1 172-31-17-127 - worker node 2
Docker should be up and running on all nodes for the commands/cluster to work so installed docker ( v1.12 or above ) and started the service. Run “docker swarm init” on the node which initializes as manager. It also provides the code to run on worker nodes to join the cluster.
Before we run the “docker swarm join” Worker nodes they should be able to reach manager on port 2377. So if there are any firewall restrictions open the port 2377 on manager node and make sure reachable from worker nodes. List of ports we need to open for docker swarm to function..
Run the “docker swarm join” command on worker nodes..
Check the status of cluster from manager node – “docker node ls”
Now on manager node lets start a service called web using – “docker service create”. There are two modes we can create service – “global” mode and “replica” mode.
If you want to run the container on each of the node present in the cluster – use “global” mode. If the image is not present then it downloads on all the nodes and start the container.
Check the service – “docker service ps web” – we can see container running on all three nodes.
Once you start the service in “global” mode – you can not scale either up or down. Starting services in “replica” mode allows to scale and it is the default mode.
But before that lets remove the running service – “docker service rm web” will do.
Now lets start web service again but “replica” mode with number of replicas 1.
As we can see the container started on worker node 1 – “docker service ps web”
On the worker node1 – “docker ps -a”
Now lets scale it to 3 replicas.. As we have already seen service already running worker node 1. Now it started running on Management node itself..
Also started on worker node 2
We can also scale down using the same command – “docker service scale web=2” – deleted the container from worker node 1.
As mentioned above by default manager node also take tasks like worker nodes. Running the command “docker node update –availability drain <managernode>” immediately stopped the container on manager node.
But my service is running in “replica” mode with 3 instances and as the instance exited on manager node, docker started on worker node 2.
Remove the service completely – “docker service rm web” which deletes containers from all nodes.