SmoothLinux Blog

Docker Swarm and Elasticsearch Clustering

28 August, 2016 | docker

My company is looking at using docker to lighten our foot print in AWS because we are massively over deployed. I'm hoping docker is the answer for this.

So I installed docker engine 1.12 and started with the elasticsearch 2.35 docker image. Worked great till I wanted to cluster elasticsearch containers. Elasticsearch containers would not join the cluster due swarm networking and Elasticsearch's best effort interfaces detection.

Elasticsearch picks up the wrong swarm IP address as it's publish address and that means it will never be able to communicate with the Elasticsearch cluster.

This example is for the default ingress network that is created when you init swarm in docker 1.12. If you put your elasticsearch cluster on another overlay network you will see different interfaces then what I list out in my examples. However, the work around will still be the same with some interface adjustments.

Default ingress swarm network looks as follows:

Command Used: docker network ls | grep ingress

Output:

NETWORK ID   NAME    DRIVER  SCOPE
eyiowqxfzxss ingress overlay swarm

The subnet that the swarm uses to communicate with the elasticsearch nodes is 10.255.0.0/16. Please note: If you do not publish a host port when running docker service create you will not will not see this network. So make sure when you start your swarm service you use -p 9200:9200 -p 9300:9300.

Displays the subnet for the default swarm subnet:

Command Used: docker network inspect ingress | grep -A2 Subnet

Output:

"Subnet": "10.255.0.0/16",
"Gateway": "10.255.0.1"

When ever you spin up a container on the swarm network its issued a 10.255.0.2/32 address and elasticsearch on start up sets this to its publish address without fail in my testing. This stopped the elasticseach server from joining the cluster as it will try and use 10.255.0.2/32 for cluster communication.

IP add from a server spun up on docker swarm's ingress network notice the 10.255.0.2/32 virtual interface.

Command Used: ip add

Stripped Down Output:

313: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP gro up default
link/ether 02:42:0a:ff:00:06 brd ff:ff:ff:ff:ff:ff
inet 10.255.0.6/16 scope global eth0
valid_lft forever preferred_lft forever
inet 10.255.0.2/32 scope global eth0

By default elasticsearch sets its publish address to 10.255.0.2 instead of 10.255.0.6 which is the address swarm is using to communicate with the container.

Example log output from an elasticsearch container at start up:

[INFO ][transport ] [Lizard] publish_address {10.255.0.2:9300}

The work around I've found is to add the lines below into the docker-entrypoint.sh script. This script lives on the root of the file system on the elasticsearch docker images /docker-entrypoint.sh and it is called before elasticsearch started. This line if code below will write the correct publish IP address into the elasticsearch.yml config before elasticsearch is started at boot.

NETWORK_PUBLISH_HOST=`/sbin/ip route|awk '/eth0/ { print $9 }'`
echo network.publish_host: $NETWORK_PUBLISH_HOST >> /usr/share/elasticsearch/config/elasticsearch.yml

Output of this code will add a new line to elasticsearch.yml with the IP address of the correct publish address:
network.publish_host: 10.255.0.6

The next thing I had to deal with was get unicast discovery working with the elasticsearch containers. By trial and error I was able to figure out the ingress network started its DHCP leases at IP address 10.255.0.6. So I set the first 3 IP addresses to be elasticsearch leaders in the elasticsearch.yml config.

Full output of my elasticsearch.yml:

network.host: _eth0:ipv4_ 
cluster.name: elasticsearchswarmtest01
discovery.zen.minimum_master_nodes: 3 
discovery.zen.ping.unicast.hosts: ["10.255.0.6", "10.255.0.7", "10.255.0.8"]

I'm sure there are others who have resolved this problem. However I did not see any solutions when searching around. Let me know how you have resolved this in your environment I'm looking for a better solutions where I do not have to do a docker build or edit docker-entrypoint.sh which I feel isn't a great solutions as the docker image will change over time.

Comments

comments powered by Disqus