Rescue or backup entire host
We will take a look at multiple unfortunate scenarios - all in one - none of which appear to be well documented, let alone intuitive when it comes to either:
- troubleshooting a Proxmox VE host that completely fails to boot; or
- a need to create a full host backup - one that is safe, space-efficient and the re-deployment scenario target agnostic.
Entire PVE host install (without guests) typically consumes less than 2G of space and it makes no sense to e.g. go about cloning entire disk (partitions), which a target system might not even be able to fit, let alone boot from.
Rescue not to the rescue
Natural first steps while attempting to rescue a system would be to aim for the bespoke PVE ISO installer 1 and follow exactly the menu path:
- Advanced Options > Rescue Boot
This may indeed end up booting up partially crippled system, but it is completely futile in a lot of scenarios, e.g. on otherwise healthy ZFS install, it can simply result in an instant error:
error: no such device: rpool
ERROR: unable to find boot disk automatically
Besides that, we do NOT want to boot the actual (potentially broken) PVE host, we want to examine it from a separate system that has all the tooling, make necessary changes and reboot back instead. Similarly, if we are trying to make a solid backup, we do NOT want to be performing this on a running system - it is always safer for the entire system being backed up to be NOT in use, safer than backing up a snapshot would be.
ZFS on root
We will pick the “worst case” scenario of having a ZFS install. This is because standard Debian does NOT support it out-of-the box and while it would be appealing to simply make use of corresponding Live System 2 to boot from (e.g. Bookworm for the case of PVE v8), this won’t be of much help with ZFS as provided by Proxmox.
Note
That said, for any other install than ZFS, you may successfully go for the Live Debian, after all you will have full system at hand to work with, without limitations and you can always install a Proxmox package if need be.
Caution
If you got the idea of pressing on with Debian anyhow and taking advantage of its own ZFS support via the contrib repository, do NOT do that. You will be using completely different kernel with completely incompatible ZFS module, one that will NOT help you import your ZFS pool at all. This is because Proxmox use what are essentially Ubuntu kernels, 3 with own patches, at times reverse patches and ZFS which is well ahead of Debian and potentially with cherry-picked patches specific to only that one particular PVE version.
Such attempt would likely end up in an error similar to the one below:
status: The pool uses the following feature(s) not supported on this system:
com.klarasystems:vdev_zaps_v2
action: The pool cannot be imported. Access the pool on a system that supports
the required feature(s), or recreate the pool from backup.
We will therefore make use of the ISO installer, however go for the not-so-intuitive choice:
- Advanced Options > Install Proxmox VE (Terminal UI, Debug Mode)
This will throw us into terminal which would appear stuck, but in fact it would be ready for input reading:
Debugging mode (type 'exit' or press CTRL-D to continue startup)
Which is exactly what we will do at this point, press C^D
to get ourselves a root shell:
root@proxmox:/# _
This is how we get a (limited) running system that is not our PVE install that we are (potentially) troubleshooting.
Note
We will, however, NOT further proceed with any actual “Install” for which this option was originally designated.
Get network and SSH access
This step is actually NOT necessary, but we will opt for it here as we will be more flexible in what we can do, how we can do it (e.g. copy & paste commands or even entire scripts) and where we can send our backup (other than a local disk).
Assuming the network provides DHCP, we will simply get an IP address with dhclient
: 4
dhclient -v
The output will show us the actual IP assigned, but we can also check with hostname -I
, 5 which will give us exactly the one we need without looking at all the interfaces.
Tip
Alternatively, you can inspect them all with ip -c a
. 6
We will now install SSH server: 7
apt update
apt install -y openssh-server
Note
You can safely ignore error messages about unavailable enterprise repositories.
Further, we need to allow root
to actually connect over SSH, which - by default - would only be possible with a key, either manually editing the configuration file and looking for PermitRootLogin
8 line that we uncomment and edit accordingly, or simply appending the line with:
cat >> /etc/ssh/sshd_config <<< "PermitRootLogin yes"
Time to start the SSH server:
mkdir /run/sshd
/sbin/sshd
Tip
You can check whether it is running with ps -C sshd -f
. 9
One last thing, let’s set ourselves a password for the root
:
passwd
And now remote connect from another machine - and use it to make everything further down easier on us:
ssh root@10.10.10.101
Import the pool
We will proceed with the ZFS on root scenario, as it is the most tricky. If you have any other setup, e.g. LVM or BTRFS, it is much easier to just follow readily available generic advice on mounting those filesystems.
All we are after is getting access to what would ordinarily reside under the root (/
) path, mounting it under a working directory such as /mnt
. This is something that a regular mount
10 command will NOT help us with in a ZFS scenario.
If we just run the obligatory zpool import
11 now, we would be greeted with:
pool: rpool
id: 14129157511218846793
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
rpool UNAVAIL unsupported feature(s)
sda3 ONLINE
And that is correct. But a pool that has not been exported does not signify anything special beyond that the pool has been marked by another “system” and is therefore presumed to be unsafe for manipulation by others. It’s a mechanism to prevent the same pool being accessed by multiple hosts at same time inadvertently - something, we do not need to worry about here.
We could use the (in)famous -f
option, this would be even suggested to us if we were more explicit about the pool at hand:
zpool import -R /mnt rpool
Warning
Note that we are using the -R
switch to mount our pool under /mnt
path, if we were not, we would mount it over our actual root filesystem of the current (rescue) boot. This is inferred purely based on the information held by the ZFS pool itself which we do NOT want to manipulate.
cannot import 'rpool': pool was previously in use from another system.
Last accessed by (none) (hostid=9a658c87) at Mon Jan 6 16:39:41 2025
The pool can be imported, use 'zpool import -f' to import the pool.
But we do NOT want this pool to then appear as foreign elsewhere. Instead, we want current system to think it is the same as the one originally accessing the pool. Take a look at the hostid 12 that is expected: 9a658c87
- we just need to write it into the binary /etc/hostid
file - there’s a tool for that: 13
zgenhostid -f 9a658c87
Now importing a pool will go without a glitch… Well, unless it’s been corrupted, but that would be for another guide.
zpool import -R /mnt rpool
There will NOT be any output on the success of the above, but you can confirm all is well with: 14
zpool status
Chroot and fixing
What we have now is the PVE host’s original filesystem mounted under /mnt/
with full access to it. We can perform any fixes, but some tooling (e.g. fixing a bootloader - something out of scope here) might require paths to be as-if real from the viewpoint of a system we are fixing, i.e. such tool could be looking for config files in /etc/
and we do not want to worry about having to explicitly point it at /mnt/etc
while preserving the imaginary root under /mnt
- in such cases, we simply want to manipulate the “cold” system as if it was currently booted one. That’s where chroot
15 has us covered:
chroot /mnt
And until we then finalise it with exit
, our environment does not know anything above /mnt
and most importantly it considers /mnt
to be the actual root (/
) as would have been the case on a running system.
Now we can do whatever we came here for, but in our current case, we will just back everything up, at least as far as the host is concerned.
Full host backup
The simplest backup of any Linux host is simply a full copy of the content of its root /
filesystem. That really is the only thing one needs a copy of. And that’s what we will do here with tar
: 16
tar -cvpzf /backup.tar.gz --exclude=/backup.tar.gz --one-file-system /
This will back up everything from the (host’s) root (/
- remember we are chroot’ed), preserving permissions, and put it into the file backup.tar.gz
on the very (imaginary) root, without eating its own tail, i.e. ignoring the very file we are creating here. It will also ignore mounted filesystems, but we do not have any in this case.
Note
Of course, you could mount a different disk where we would put our target archive, but we just go with this rudimentary approach. After all, a GZIP’ed freshly installed system will consume less than 1G in size - something that should easily fit on any root filesystem.
Once done, we exit the chroot, literally:
exit
What you do with this archive - now residing in /mnt/backup.tar.gz
is completely up to you, the simplest possible would be to e.g. securely copy it out over SSH, even if only just a fellow PVE host: 17
scp /mnt/backup.tar.gz root@10.10.10.11:~/
The above would place it into the remote system’s root’s home directory (/root
there).
Tip
If you want to be less blind, but still rely on just SSH, consider making use of SSHFS. 18 You would then “mount” such remote directory, like so:
apt install -y sshfs
mkdir /backup
sshfs root@10.10.10.11:/root /backup
And simply treat it like a local directory - copy around what you need and as you need, then unmount.
That’s it
Once done, time for a quick exit: 19 20
zfs unmount rpool
reboot -f
Tip
If you are looking to power the system off, then poweroff -f
will do instead.
And there you have it, safely booting into an otherwise hard to troubleshoot setup with bespoke Proxmox kernel guaranteed to support the ZFS pool at hand and complete backup of the entire host system.
If you wonder how this is sufficient, how to make use of such “full” backup (of less than 1G) and ponder the benefit of block cloning entire disks with de-duplication (or lack thereof on encrypted volumes) only to later find out the target system needs differently sized partitions with different capacity disks, or even different filesystems and is a system booting differently - there’s none and we will demonstrate so in a follow-up guide.
https://www.proxmox.com/en/downloads/proxmox-virtual-environment/iso ↩︎
https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/ ↩︎
https://manpages.debian.org/bookworm/isc-dhcp-client/dhclient.8.en.html ↩︎
https://manpages.debian.org/bookworm/hostname/hostname.1.en.html ↩︎
https://manpages.debian.org/bookworm/iproute2/ip-address.8.en.html ↩︎
https://manpages.debian.org/bookworm/openssh-server/sshd_config.5.en.html#PermitRootLogin ↩︎
https://manpages.debian.org/bookworm/mount/mount.8.en.html ↩︎
https://manpages.debian.org/bookworm/zfsutils-linux/zpool-import.8.en.html ↩︎
https://manpages.debian.org/bookworm/manpages-dev/gethostid.3.en.html ↩︎
https://manpages.debian.org/bookworm/zfsutils-linux/zgenhostid.8.en.html ↩︎
https://manpages.debian.org/bookworm/zfsutils-linux/zpool-status.8.en.html ↩︎
https://manpages.debian.org/bookworm/coreutils/chroot.8.en.html ↩︎
https://manpages.debian.org/bookworm/openssh-client/scp.1.en.html ↩︎
https://manpages.debian.org/bookworm/sshfs/sshfs.1.en.html ↩︎
https://manpages.debian.org/bookworm/zfsutils-linux/zfs-unmount.8.en.html ↩︎
https://manpages.debian.org/bookworm/runit-init/reboot.8.en.html ↩︎