One Big App (High-Availability) #

You have an application (which is composed of multiple Docker services) and want to increase its availability by distributing it across multiple nodes so that one node can go down and your application stays online.

Setup #

Let’s say we have three nodes:

node001
node002
node003

Our app consists of two two stateless services (http and api), and one database (etcd in this case, a distributed and fault-tolerant key-value store).

General Considerations #

The Docker Swarm setup will look very similar to the one tuned for high-performance with two important differences:

Traffic Ingress #

The ingress/load-balancer needs to be able to tolerate a node going down. And it obviously also can’t just run on one single node.

If you order your cluster with our managed Traefik loadbalancer, this part will be taken care of for you. For custom setups, you’ll have to watch out for that yourself.

Stateful Services #

The same sort of thing goes for databases and storage engines and similar. You’ll probably want to configure some sort of multi-master system.

Details differ widely, depending on what database you use. We’ll be using etcd here as an example.

Docker volumes are always specific to each node. Any service using volumes must be pinned to a specific node - so it always stays where the data is.

The basic idea is to start databases as multiple services, one for each node, and set them up as a cluster.

Stack.yml #

Putting it all together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
version: "3.8"

x-etcd-cluster-config: &etcd-cluster-config
  ALLOW_NONE_AUTHENTICATION: "yes"
  ETCD_LISTEN_PEER_URLS: http://0.0.0.0:2380
  ETCD_LISTEN_CLIENT_URLS: http://0.0.0.0:2379
  ETCD_INITIAL_CLUSTER_TOKEN: etcd-cluster
  ETCD_INITIAL_CLUSTER_STATE: new
  ETCD_INITIAL_CLUSTER: etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380

services:

  http:
    image: registry.example.com/my/http
    networks:
      - api_net
    deploy:
      replicas: 3
      placement:
        max_replicas_per_node: 1

  api:
    image: registry.example.com/my/api
    networks:
      - api_net
      - db
    deploy:
      replicas: 3
      placement:
        max_replicas_per_node: 1

  etcd1:
    image: docker.io/bitnami/etcd:3
    environment:
      <<: *etcd-cluster-config
      ETCD_NAME: etcd1
      ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd1:2380
      ETCD_ADVERTISE_CLIENT_URLS: http://etcd1:2379
    networks:
      - db
    volumes:
      - etcd1_data:/bitnami/etcd/data
    deploy:
      placement:
        constraints:
          - node.hostname == node001
  etcd2:
    image: docker.io/bitnami/etcd:3
    environment:
      <<: *etcd-cluster-config
      ETCD_NAME: etcd2
      ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd2:2380
      ETCD_ADVERTISE_CLIENT_URLS: http://etcd2:2379
    networks:
      - db
    volumes:
      - etcd2_data:/bitnami/etcd/data
    deploy:
      placement:
        constraints:
          - node.hostname == node002
  etcd3:
    image: docker.io/bitnami/etcd:3
    environment:
      <<: *etcd-cluster-config
      ETCD_NAME: etcd3
      ETCD_INITIAL_ADVERTISE_PEER_URLS: http://etcd3:2380
      ETCD_ADVERTISE_CLIENT_URLS: http://etcd3:2379
    networks:
      - db
    volumes:
      - etcd3_data:/bitnami/etcd/data
    deploy:
      placement:
        constraints:
          - node.hostname == node003

volumes:
  etcd1_data:
  etcd2_data:
  etcd3_data:
networks:
  api_net:
  db:

In detail:

x-etcd-cluster-config: ... - x-* keys are ignored by Docker Swarm - this sets up a YAML-Anchor so we can avoid duplicating all these values for each etcd.
http and api
- replicas: 3 - we start 3 containers in total..
- max_replicas_per_node: 1 - …one on each node
etcd1, etcd2, etcd3
- one etcd service per node, set up to form a cluster
- placement constraint node.hostname == node00X: each one of them is pinned to one of the nodes
- volumes: etcd1_data, etcd2_data, etcd3_data: we’ve given each etcd its own volume name - that makes it clear that these are actually not the same volumes.

Even if we named all of those etcd_data, they would still be three different volumes:
etcd_data on node001
etcd_data on node002
etcd_data on node003
The different names we used just make that explicit.

Behavior #

Now, if one node goes down, the application will continue to work:

we still have two http services, and their api services, on the remaining nodes
the etcd cluster has two of three nodes left, which is a state in which etcd is still fully functioning

In practice, we recommend actually testing the failure scenarios. Complex systems like clustered databases sometimes behave in complex ways.

Some things to look out for:

Some databases, even in a clustered configuration, enter a “degraded” state if a node fails. So in some cases, you might need to write your app such that it can deal with a read-only DB.

Other databases don’t have automatic recovery. That is, the failed node will come up again, but it won’t join the database cluster. That may not lead to immediate failures - but if later another node goes offline, the now-2-node cluster can probably not handle that failure gracefully.