Docker Swarm – Manager Nodes

Reference: https://docs.docker.com/engine/swarm/swarm-tutorial/

On the manager node (to be) initialize the docker swarm cluster – “docker swarm init”. It also prints the command to run on the worker node to join the cluster. If we lose the command then run “docker swarm join-token worker” on the manager node to get it and run on the nodes to join the cluster as worker nodes.

Docker_Swarm_Init2.JPG

Run “docker swarm join-token manager” to get the command and run on the nodes to join the cluster as manager nodes.

Docker_Swarm_Manager.JPG

Now lets run on the nodes to join cluster as manager nodes..

Docker_Swarm_Manager2

Docker_Swarm_Manager3.JPG

By default manager nodes also take tasks like worker nodes but we can disable it by “draining” the node.

Docker_Swarm_Drain3.JPG

I’ve added one more node as manager node as recommended by docker : https://docs.docker.com/engine/swarm/admin_guide/#add-manager-nodes-for-fault-tolerance

Docker_Swarm_Manager4.JPG

As we can see ip-172-31-41-147 is the manager node and also “Leader” which distribute tasks to worker nodes.

Scenario 1: Stopping “Leader” nodes one by one.

Stopping the instance ip-172-31-41-147.. ip-172-31-41-147 is marked as “Unreachable” and ip-172-31-35-29 marked as “Leader”.

Docker_Swarm_Manager5.JPG

Though bringing the instance back online wont change the “Leader” but we can still manage the cluster using  ip-172-31-35-29.

Now also stopping ip-172-31-35-29.. not able to run cluster commands any more from ip-172-31-18-175

Docker_Swarm_Manager6.JPG

Recover from losing the quorum : https://docs.docker.com/engine/swarm/admin_guide/#recover-from-losing-the-quorum

Docker_Swarm_Manager8.JPG

After forcing the new cluster below is the status of nodes..

Docker_Swarm_Manager9.JPG

We can manage the existing services..

Docker_Swarm_Manager10.JPG

Now I’ve started ip-172-31-35-29 and as expected not able to run any cluster commands. So made the node to leave the old cluster and joined back the cluster as manager node.

Docker_Swarm_Manager11.JPG

But now there are two id’s for the same node ip-172-31-35-29 ( old and new one ) so remove the old one.

Docker_Swarm_Manager12.JPG

Scenario 2: Stopping “Non-Leader” nodes.. we can still manage cluster with single manager node

Docker_Swarm_Manager7.JPG

As mentioned in docker document “For instance, whether you have 3 or 4 managers, you can still only lose 1 manager and maintain the quorum. If you have 5 or 6 managers, you can still only lose two.”

Scenario 3: Losing all manager nodes.. but to restore we need to have back up of /var/lib/docker/swarm of previous manager node. Reference: https://docs.docker.com/engine/swarm/admin_guide/#recover-from-disaster

I’ve provisioned new node ip-172-31-12-118 and made the docker running.

Docker_Swarm_Manager13

Copied the backup and restored on the new node.

Docker_Swarm_Manager14.JPG

Start docker swarm with force new cluster..

Docker_Swarm_Manager15.JPG

Now cluster can recognize the nodes and existing services but still cannot manage using new manager node.

Docker_Swarm_Manager16.JPG

On one of the worker nodes two instances of “web” service is running

Docker_Swarm_Worker.JPG

We cannot join back the worker node to the cluster as it is already part of the old swarm cluster. Leaving the swarm also deleted the “web” service instances from the worker node.

Docker_Swarm_Worker2.JPG

As soon as I’ve joined the node to the new cluster as worker node – “web” service started again.

Docker_Swarm_Worker3.JPG

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s