Prepare Yourself – Dann's Digital Laboratory

A home lab can be a fun playground to explore different software packages, connect a few smart lights and generally tinker around. But once your ‘lab’ grade Home Assistant platform becomes something the house must have to function, when Node Red automation is depended for scenes to occur, when Zigbee2mqtt is a necessity to integrate your smart locks… your home ‘lab’ just became a home ‘production grade’ system.

The new names of the game are stability, resiliency, availability. You start asking questions. So you installed some hypervisor and have multiple VMs, but you built everyone on a single server. What if you have a hardware failure? Sure you have backups, but the time required to restore service means your smart switches don’t work, your lights are stuck on and your bathroom vent won’t turn on because the bluetooth humidity sensors aren’t triggering automations.

This series describes one way to solve this problem. Is it the only way? Hell no. Is it the best way? Probably not. Is it a better way than a single point of hardware failure? Definitely. You don’t think so? Read the series. Still don’t agree? Agree to disagree. Want to argue about it? Talk to someone else who cares.

If you’re read this far, then you’re interested in knowing more. Let’s lay out what you’ll be dealing with and start describing suggested hardware components to make it work.

Networking:

This is all primarily IPV4 based. You are going to need a static IP for each node, five in total. You should (obviously) make sure that they are not within your DHCP pool range. Also, you will want about a dozen or so static IP addresses reserved for virtual IPs (VIPs). You’ll learn more about that later. Just to be on the safe side, round up to 20 addresses.

DNS domain:

You will want a DNS domain to use at home. If you don’t want to host any services that will be available externally, then you can just make one up such as ‘dann.local’. If you intend to host services accessible from outside your network (through firewall ports), then you will want a real DNS domain ready for that. I’ll be using ‘digitaldann.net’ as an example.

Software:

Ubuntu 22.04 LTS – base system (as of the last update)
Hashicorp Nomad – The container scheduler. It’s what manages the containers, decides which cluster nodes run which containers, and ensures that they keep running.
Hashicorp Consul – A service discovery directory. It integrates natively with Nomad (big surprise).
Docker – The actual containerization platform.
GlusterFS – A distributed filesystem that can span multiple cluster nodes. Fast and resilient.
Keepalived – A load balancing daemon that assigns Virtual IP (VIP) addresses based on scripted health checks.
Iptables – Manages rules for IP package filtering and forwarding.
Ansible – A configuration management tool for deploying software and configuring systems.

These software components will all tie together to create the production grade cluster for your home. They’re also all free.

Hardware

HP EliteDesk 800 G4 Desktop Mini (five of them, yes 5x)
- i5-8500T processor
- 16GB memory
- 256gb NVMe for base system (500gb+ is my preference)
- 500gb NVMe for GlusterFS (1tb is my preference)
- A dummy DisplayPort plug to simulate a monitor connection. Required for vPro functions.

You might look at this hardware requirement and raise an eyebrow. Let’s boil it down:

The G4 (fourth generation) has several advantages over the G1/G2/G3 models. It’s the first generation that has two (2x) NVMe slots. Also, the 8th generation i5 is the first with six (6x) cores instead of four (4x).
There isn’t a significant enough performance gain by jumping to the G5/G6/G7 models or higher to warrant the cost.
You can get used G4 models from eBay with the i5-8500T processor, 16GB of memory (a must) and a 256gb NVMe for $150 USD or less. You don’t care if it has Windows preinstalled or a wireless card. You should make sure it has a power supply.
Make sure you get the i5-8500T processor, not the i5-8500 (non-T). The T series processor has a lower max TDP. You’re don’t need to build a powerhouse, you’ll do just fine with a cluster of lesser capable systems that distribute the load.
The 256gb 500gb NVMe is for the base system (operating system). The 500gb NVMe is for the data disk (glusterfs system). You will probably need to buy the 500gb NVME for each system. It’s a best practice to separate your system disk from your data disk. Last I checked, 500gb NVME disks are about $50 USD.

This is the recommendation. Here are some frequently asked questions about it:

Why use HP minis? I prefer Lenovo ThinkCentre Tinys.
Knock yourself out. I don’t really care what you use, this is just what I can recommend from experience.
What about HP ProDesk 600 minis?
The EliteDesk 800’s come with vPro, which provides (among other things) remote management and virtual KVM control of the system over TCP. That’s built into the BIOS, so you can use it to manage the system even if the installed base system fails. The ProDesk 600’s lack this. They’re not worth it.
Five of them!? Isn’t three good enough?
No, it’s not. If you’re really serious about having a cluster that will run all your software and be able to handle critical hardware failure, you need five (5x) nodes. If you have five nodes, you can drop two (2x) nodes and still maintain a quorum of N/2+1. You need to maintain quorum for GlusterFS/Nomad/Consul. If you have only three nodes, you can lose one… but without an odd number of nodes left you run the risk of a split brain cluster. That’s bad. Very bad. Go with five nodes now, you won’t regret it later.

Summary

So yeah, it feels like a significant financial outlay. But if you do the math, you’re coming in at probably just about or under $1000. How much is your up-time worth?

Trust me, it’s worth it.