Introduction
GlusterFS is a replicated file system that will form one component of the backbone that makes your cluster work. It isn’t necessary to understand how GlusterFS operates in detail but there are some useful things to know as you set it up and operate with it:
- GlusterFS creates a replicated volume that is mounted on all cluster nodes. Any changes written to any node is [almost instantly] replicated and available on the other nodes. This includes all changes such as writes and deletes!
- GlusterFS operates best with odd numbers of cluster members and while three is the bare minimum, five is a better number. Why? Consider a case with three nodes in a cluster. A majority of the cluster must be available for read and write transactions to be replicated (in this case, two). If one node goes down, the other two nodes should continue to to allow the replicated volume to function… unless there is a disagreement. This situation is called a ‘split-brain’. One of the two nodes may declare a file has content X while the other node may declare that file has content Y. Who wins? No one wins, there is no third node to cast the deciding vote. This means any time a node is taken down (in a three node cluster), there is a risk of a ‘split-brain’. This is bad. In a five node cluster, if you take one or even two nodes down (maintenance, patching, updating), the cluster will continue to replicate the volume changes because the ‘quorum’ is odd numbered and a majority of the cluster. Five is better than three.
- GlusterFS is fast and but does not handle many, many transactional changes to the same files in rapid succession. This means you should not use GlusterFS to host a database backend (MongoDB, MariaDB). Ask me how I know this. Just don’t do it. Instead, setup a MongoDB or MariaDB cluster that stores data on the local disk and manages it’s own replication.
Environment
For the sake of this guide, assume there are three cluster nodes with addressing such as:
node1 - 10.0.3.101 node2 - 10.0.3.102 node3 - 10.0.3.103
Performing actions as root
The ideal security model dictates that interactively operating as root is incorrect, and that operations should run as a user, using ‘sudo’ to elevate permissions where necessary. Unfortunately, almost everything that needs to be done here will require ‘sudo’, so it will be faster to just become root and run everything as root:
sudo su -
Perform the following steps on each node until instructed otherwise
Install GlusterFS
apt install glusterfs-server -y
systemctl enable glusterd
systemctl start glusterd
Perform the following steps on node1 only until instructed otherwise
Setup a gluster volume called ‘vagabond’ (or any other name) by running the following:
gluster peer probe 10.0.3.102
gluster peer probe 10.0.3.103
gluster peer status
gluster volume create vagabond replica 3 \ 10.0.3.101:/mnt/bricks/vagabond \ 10.0.3.102:/mnt/bricks/vagabond \ 10.0.3.103:/mnt/bricks/vagabond
gluster volume set vagabond features.trash on
gluster volume start vagabond
gluster volume bitrot vagabond enable
gluster volume status
Perform the following steps on each node until instructed otherwise
mkdir /mnt/vagabond
bash -c "echo 'localhost:/vagabond /mnt/vagabond glusterfs \ defaults,_netdev,noauto,x-systemd.automount 0 0' >> /etc/fstab"
mount /mnt/vagabond
And that’s it, you now have a gluster volume called ‘vagabond’. You can verify it is working by:
Perform the following steps on any one node until instructed otherwise
touch /mnt/vagabond/test.txt
Perform the following steps on all other nodes until instructed otherwise
ls -la /mnt/vagabond
What if I want to build more nodes right away?
If you have all five nodes ready to go, simply modify your gluster volume create command to include all five nodes. Easy enough.
What if I add more nodes later?
In this example, you may want to add another node later:
node4 - 10.0.3.104
You can add a node to an existing volume by running the following command from an existing cluster node (not the new one):
sudo gluster peer probe 10.0.3.104
sudo gluster volume add-brick vagabond replica 4 \ 10.0.3.104:/mnt/bricks/vagabond
Note that you use add-brick instead of create, and that you update the replica size to the new total number of nodes. Now you can run the following on the new node to setup the mount:
mkdir /mnt/vagabond
bash -c "echo 'localhost:/vagabond /mnt/vagabond glusterfs \ defaults,_netdev,noauto,x-systemd.automount 0 0' >> /etc/fstab"
mount /mnt/vagabond
Conclusion
That’s all! Move on the to next part.