Finally I have had enough time to upgrade my Kubernetes cluster. As I wrote previously, my Kubernetes cluster consisted of two Rpi4 with 8GM RAM and one Rpi2 with 1GB RAM. The Rpi2, which acted as master node, was the bottleneck slowing the cluster down. It worked for something like a week or two, until it suddenly became busy and the load went over 10. In that situation it was possible to login via SSH and reboot the node, but it took time.
To make cluster faster is a good thing, but there was also another benefit. K3s has recently added support for high availability mode with embedded etcd database. It requires at least K3s v1.19.1 and I my original cluster was running v1.17.5. A little detective work revealed that there is HA branch in k3s-ansible, that I used with my original cluster installation. The branch is not yet merged to master, but it seems stable enough for me and there is an issue which gives hope for the future.
Replacing Rpi2 meant that I had to invest in new hardware. I bought a couple of new components besides the obvious Rpi4:
- Kingston 240GB A400 SATA3 SSD
- StarTech USB 3.1 Gen 2 SATA 2,5" adapter cable with UASP support
- New layers to my cluster case
The basic installation went similarly to the previous one. When the RPis were online, I needed to do a few changes. First, I had to create a separate partition for the persistent data. I also needed to get the code from the k3s-ansible repository’s HA branch and proceed according to the instructions inside the branch.
Once the k3s was installed with three masters and no workers, the thing we need to get containers talking outside of the cluster is load balancer. In the cloud these come automatically from the cloud service, but when self hosting, we need to provide them ourselves. MetalLB is very good option in that space and the [installation] is very straightforward.
Next building block is ingress-nginx, which works as a reverse proxy and allows us to publish applications from a single IP address with nginx.
To ease the burden of maintaining certificates, we will deploy cert-manager.
it was time to put the partitions for persistent data to good use. In the first version of the cluster I tried to use GlusterFS, but I couldn’t get it working. My next choice was Longhorn and I was very pleased with how it performed replicating the data between the two worker nodes. Therefore it was my choice for this cluster version, too. This time it was replicating with three USB disks instead of two.
The first step with Longhorn is to install the required package to every node:
sudo apt install open-iscsi
Second you need to install helm to the computer you want to use to manage the k8s cluster.
When the installation is done, you need to add the Longhorn repository to helm and update helm repositories:
helm repo add longhorn https://charts.longhorn.io helm repo update
The more verbose instructions with better explanations are available here.
Before we continue to installing Longhorn, we need to prepare the storage for it. In my case, I use path /data on every node to replicate its contents to other nodes. Therefore, I mount my separate persistent data partition to /data in /etc/fstab.
Fetch the value file from Github and edit it according to your needs.
curl -Lo values.yaml https://raw.githubusercontent.com/longhorn/charts/master/charts/longhorn/values.yaml
Create a namespace for Longhorn, install it with helm and monitor the installation.
kubectl create namespace longhorn-system helm install longhorn longhorn/longhorn --namespace longhorn-system --values values.yaml kubectl -n longhorn-system get pod
When Longhorn in installed, you can access the UI without any authentication. If you want some security, you can use the aforementioned Longhorn docs to require authentication.
The last thing I did for Longhorn before considering it production-ready was configuring backups. I had already an NFS share that I could use as a target. Biggest obstacle was to find the right format for the setting. After a bit of testing, I found it to be:
Now that the cluster itself is up and running, it is time to start using it for something (non-)productive.