Study Notes: TPM, vTPM, and the Boot Process

Introduction

When you run Kubernetes clusters on Azure, you eventually have to care about hardware security. Not because it is trendy, but because your worker nodes need proof they haven’t been tampered with at the boot level.

As a Kubernetes engineer, I end up touching infrastructure components like virtual machines and disk encryption whether I want to or not. And whenever I look into hardware security, I always learn something new about how cloud instances work.

This article is my attempt to consolidate notes on Trusted Platform Modules (TPM) and virtual TPMs (vTPM), specifically how they secure Kubernetes worker nodes.

The Boot Process

To make sense of hardware security, we need to start with how a server boots.

Firmware Stage (BIOS vs UEFI)

When a server powers on, the firmware runs first. Older BIOS uses MBR (Master Boot Record), which is small (512 bytes) and points to an initial bootloader, which then loads the rest.

UEFI is the modern standard. It removes that size limitation by looking for .efi files in a special disk partition (EFI System Partition), pointing directly to the bootloader. So it loads the full bootloader without the multi-stage dance.

Bootloader and initramfs

Once the firmware hands off to the bootloader (like GRUB), it loads the OS kernel into RAM. Before the “real OS” starts, the bootloader also loads initramfs into memory.

initramfs is maintained by OS providers (though you can update it if using custom images). It contains kernel modules, drivers, cryptographic tools, and the /init script, which is basically a mini-systemd.

What is a TPM?

A Trusted Platform Module (TPM) is a dedicated microchip on the motherboard. Its job is hardware-based cryptographic functions.

A TPM is not an antivirus. It doesn’t scan for malicious software. It is a mathematical engine that compares cryptographic hashes to verify whether the system’s boot process has changed.

It does this using Platform Configuration Registers (PCRs). PCRs are volatile memory slots inside the TPM chip that store cryptographic hashes of the software that runs during boot.

Measured Boot

Since the TPM verifies the boot process, it follows a specific sequence called Measured Boot.

The flow is:

  • UEFI Firmware calculates the hash of the bootloader.
  • Bootloader calculates the hash of the OS kernel.
  • These hashes are sent to the TPM.

The TPM combines these hashes and stores the final result in the PCRs. This final hash represents the exact state of the boot process.

Sealing and Unsealing

When you enable Full Disk Encryption on Linux, the OS generates a decryption key.

Then it sends this key to the TPM with a command: “Encrypt this key, and only decrypt it if the current PCR hashes match the baseline hash.” This is Sealing.

During the next reboot, the system performs Unsealing:

  1. The hardware measures the boot process again.

  2. If the bootloader or OS kernel is unchanged, the new hash matches the sealed baseline. Since initramfs contains crypto tools, it can talk to the TPM, verify the PCR hashes, get the decryption key, and unlock the physical drive. The OS boots.

  3. If an attacker changes the bootloader, the hardware generates a different hash. The TPM sees the mismatch and refuses to release the key. The disk stays locked.

The SRE Reality: Upgrades

This strict checking creates an operational headache.

When you upgrade a Linux kernel for security patches, the binary changes. Since the code changes, the boot hash changes too. If you reboot after a kernel upgrade, the new hash won’t match the sealed baseline, and the TPM locks you out.

To prevent this, the OS upgrade process must Reseal. It unseals the key using the old kernel state, and seals it again with the expected hash of the new kernel before rebooting.

vTPM on Azure

In a cloud environment like Azure, your Kubernetes worker nodes are VMs running on shared physical hosts. Multiple VMs can’t securely share a single physical TPM chip.

Azure gives each VM a Virtual TPM (vTPM). A vTPM is a software emulation of the hardware chip. Each VM gets its own isolated PCRs and cryptographic keys.

In Azure, vTPM is part of the Trusted Launch security profile. It provides the same Measured Boot and Sealing functions as a physical TPM.

Cloud Provisioning: cloud-init vs Ignition

Now that we have the boot sequence and vTPM covered, how do we actually provision these worker nodes? Ubuntu uses cloud-init, Flatcar Container Linux uses Ignition.

  • cloud-init runs late in the boot process. It runs after the kernel has mounted the root filesystem (/) and after systemd has started services like networking.

  • Ignition runs early, during the initramfs stage, before the kernel mounts the real root filesystem.

This early execution is needed for immutable OSes like Flatcar. Flatcar mounts /usr as strictly read-only after boot, so you can’t update or create binaries there normally. Ignition runs before these read-only rules apply, allowing it to provision the system, while leaving only /etc (configs) and /var (logs/state) writable.

Verifying Setup

Once the worker node boots and joins the cluster, you can verify the vTPM is running. Connect to the node and check the device files:

1
ls -l /dev/tpm*

If the vTPM is enabled by the Azure hypervisor, you’ll see /dev/tpm0 and /dev/tpmrm0. This means the OS can talk to the virtual cryptoprocessor to seal disk encryption keys.

Feel free to suggest improvements on GitHub or through my Twitter.