Cluster filesystem build

The Proxmox cluster filesystem build

This is a further follow-up post to the introductory piece on Corosync and a logical natural next step after installing our own probe.

We will build our own pmxcfs 1 from the original sources which we will deploy on our probe to make use of all the Corosync messaging from other nodes and thus expose the cluster-wide shared /etc/pve on our probe as well.

The staging

We will perform the below actions on our probe host, but you are welcome to follow along on any machine. The resulting build will give you a working instance of pmxcfs, however without the Corosync setup, it would act like an uninitialised single-node instead.

First, let’s gather the tools and libraries that pmxcfs requires:

apt install -y git make gcc check libglib2.0-dev libfuse-dev libsqlite3-dev librrd-dev libcpg-dev libcmap-dev libquorum-dev libqb-dev

Most notably, this is the Git 2 version control system with which the Proxmox sources can be fetched, the Make 3 executable building tool and the GNU compiler 4.

We can now explore Proxmox Git reporistory 5, or even simpler, consult one of the real cluster nodes (installed v8.3) - the package containing pmxcfs is pve-cluster:

cat /usr/share/doc/pve-cluster/SOURCE 
git clone git://git.proxmox.com/git/pve-cluster.git
git checkout 3749d370ac2e1e73d2558f8dbe5d7f001651157c

This helps us fetch exactly the same version for sources as we have on the cluster nodes. Do note the version of pve-cluster as well:

pveversion -v | grep pve-cluster
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
pve-cluster: 8.0.10

Back to the build environment - on our probe host - we will create a staging directory, clone the repository and enter it:

mkdir ~/stage
cd ~/stage
git clone git://git.proxmox.com/git/pve-cluster.git
cd pve-cluster/
Cloning into 'pve-cluster'...
remote: Enumerating objects: 4915, done.
remote: Total 4915 (delta 0), reused 0 (delta 0), pack-reused 4915
Receiving objects: 100% (4915/4915), 1.02 MiB | 10.50 MiB/s, done.
Resolving deltas: 100% (3663/3663), done.

What is interesting at this point is to check the log:

git log
commit 3749d370ac2e1e73d2558f8dbe5d7f001651157c (HEAD, origin/master, origin/HEAD, master)
Author: Thomas L
Date:   Mon Nov 18 22:20:01 2024 +0100

    bump version to 8.0.10
    
    Signed-off-by: Thomas L

commit 6a1706e5051ae2ab141f6cb00339df07b5441ebc
Author: Stoiko I
Date:   Mon Nov 18 21:55:36 2024 +0100

    cfs: add 'sdn/mac-cache.json' to observed files
    
    follows commit:
    d8ef05c (cfs: add 'sdn/pve-ipam-state.json' to observed files)
    with the same motivation - the data in the macs.db file is a cache, to
    prevent unnecessary lookups to external IPAM modules - is not private
    in the sense of secrets for external resources.
    
    Signed-off-by: Stoiko I

---8<---

Do note that the last “commit” is exactly the same as we found we should build from according to real node (currently most recent), but if you follow this in the future and there’s more recent ones than last built into the repository package, you should switch to it now:

git checkout 3749d370ac2e1e73d2558f8dbe5d7f001651157c

The build

We will build just the sources of pmxcfs:

cd src/pmxcfs/
make

This will generate all the necessary objects:

ls
cfs-ipc-ops.h	   cfs-plug-link.o     cfs-plug.o.d   check_memdb.o	create_pmxcfs_db.c    dcdb.h	libpmxcfs.a  logtest.c	  Makefile   pmxcfs.o	 server.h
cfs-plug.c	   cfs-plug-link.o.d   cfs-utils.c    check_memdb.o.d	create_pmxcfs_db.o    dcdb.o	logger.c     logtest.o	  memdb.c    pmxcfs.o.d  server.o
cfs-plug-func.c    cfs-plug-memdb.c    cfs-utils.h    confdb.c		create_pmxcfs_db.o.d  dcdb.o.d	logger.h     logtest.o.d  memdb.h    quorum.c	 server.o.d
cfs-plug-func.o    cfs-plug-memdb.h    cfs-utils.o    confdb.h		database.c	      dfsm.c	logger.o     loop.c	  memdb.o    quorum.h	 status.c
cfs-plug-func.o.d  cfs-plug-memdb.o    cfs-utils.o.d  confdb.o		database.o	      dfsm.h	logger.o.d   loop.h	  memdb.o.d  quorum.o	 status.h
cfs-plug.h	   cfs-plug-memdb.o.d  check_memdb    confdb.o.d	database.o.d	      dfsm.o	logtest      loop.o	  pmxcfs     quorum.o.d  status.o
cfs-plug-link.c    cfs-plug.o	       check_memdb.c  create_pmxcfs_db	dcdb.c		      dfsm.o.d	logtest2.c   loop.o.d	  pmxcfs.c   server.c	 status.o.d

We do not really care for anything except the final pmxcfs binary executable, which we copy out to the staging directory and clean up the rest:

mv pmxcfs ~/stage/
make clean

Now when we have a closer look, it is a bit big compared to stock one.

The one we built:

cd ~/stage
ls -la pmxcfs
-rwxr-xr-x 1 root root 694192 Nov 30 14:29 pmxcfs

Whereas on a node, the shipped one:

ls -l /usr/bin/pmxcfs
-rwxr-xr-x 1 root root 195392 Nov 18 21:19 /usr/bin/pmxcfs

Back to the build host, we will just strip debugging symbols off, but put them into a separate file in case we need it later. For that, we take another tool:

apt install -y elfutils 
eu-strip pmxcfs -f pmxcfs.dbg

Now that’s better:

ls -l pmxcfs*
-rwxr-xr-x 1 root root 195304 Nov 30 14:37 pmxcfs
-rwxr-xr-x 1 root root 502080 Nov 30 14:37 pmxcfs.dbg

The run

Well, let’s run this:

./pmxcfs

Check it is indeed running:

ps -u -p $(pidof pmxcfs)
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         810  0.0  0.4 320404  9372 ?        Ssl  14:38   0:00 ./pmxcfs

It created its mount of /etc/pve:

ls -l /etc/pve/nodes
total 0
drwxr-xr-x 2 root www-data 0 Nov 29 11:10 probe
drwxr-xr-x 2 root www-data 0 Nov 16 01:15 pve1
drwxr-xr-x 2 root www-data 0 Nov 16 01:38 pve2
drwxr-xr-x 2 root www-data 0 Nov 16 01:39 pve3

And well, there you have it, your cluster-wide configurations on your probe host.

Important

This assumes your corosync service is running and set up correctly as was the last state of the previous post on the probe install.

What we can do with this

We will use it for further testing, debugging, benchmarking, possible modifications - after all it’s a matter of running a single make. Do note that we will be doing all this only on our probe host, not on the rest of the cluster nodes.

Beyond these monitoring activities, there can be quite a few other things you can consider doing on such a probe node, such as backup cluster-wide configuration for all the nodes once in a while.

And also anything that you would NOT want to be happening on actual node with running guests, really.