TL;DR

As a small start-up time optimization, you can pick the best suited compression algorithm for the initial ramdisk.

The Initial Ramdisk

When a Linux system boots, it needs to mount the root filesystem /. This may be relatively complicated, as it may be on a software RAID, on LVM, encrypted… To keep things manageable, an initial ramdisk can be used to get a small environment that has all the required modules and configuration to load the root filesystem. On Arch Linux, this initial ramdisk is generated using mkinitcpio. It takes multiple parameters to tune various aspects of the system and of the generated ramdisk.

Compression

One such parameter is COMPRESSION. It compresses the ramdisk to make the resulting image smaller. The manpage reads:

COMPRESSION

Defines a program to filter the generated image through. The kernel understands the compression formats yielded by the zstd, gzip, bzip2, lz4, lzop, lzma, and xz compressors. If unspecified, this setting defaults to zstd compression. In order to create an uncompressed image, define this variable as cat.

Another reason to compress the image is that it may reduce the start-up time. To understand why, imagine that the image is 100 MiB in size and only 20 MiB after compression. Let’s say that the disk reads 10 MiB per second and that the CPU can decompress the full image in 1 second. If we keep the image uncompressed, the disk will need 10 seconds to read the uncompressed image, while it needs only 2 seconds to read the compressed image. Adding the decompression time, the compressed version require only 3 seconds.

Trade-offs

The above example is quite simple, but it illustrates the trade-off between a bigger image that the disk will take longer to read and a smaller image that may take longer to decompress. It is thus more of a spectrum, where more CPU-intensive compression (and decompression) methods could result in a smaller image and less read from the disk but more CPU time:

more read,                      less read,
less CPU                          more CPU
 ◄────────────────────────────────────►
   uncompressed        lz4         zstd

Then, the question is: is it worth compressing an image more (or at all), to get a faster start-up time?

Protocol

To answer this question on a particular machine1, let’s compare the time required to read and decompress various initial ramdisks.

I’m using the linux package in version 5.18.15-arch1-2 from the Arch Linux repository. Then, I generate (sudo mkinitcpio -p linux) various images with the following parameters in /etc/mkinitcpio.conf:

  • COMPRESSION="cat"
  • COMPRESSION="lz4"
  • COMPRESSION="zstd"

Each image is copied in a directory and renamed according to the compression used: cp /boot/initramfs-linux.img initramfs-linux.img.zstd. The result is as follows:

$ file *.img*
initramfs-linux.img:      ASCII cpio archive (SVR4 with no CRC)
initramfs-linux.img.lz4:  LZ4 compressed data (v0.1-v0.9)
initramfs-linux.img.zstd: Zstandard compressed data (v0.8+), Dictionary ID: None

The images are compressing quite well too:

FileSize (MiB)
initramfs-linux.img61M
initramfs-linux.img.lz432M
initramfs-linux.img.zstd22M

We could compare more algorithms and compression level, but compression levels would need to be passed through COMPRESSION_OPTIONS, which the manpage discourages, as it can result in an unbootable image.

Results

Let’s run some decompression commands and compare their run-time with hyperfine. On a quiet computer:

$ hyperfine \
    --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' \
    'lz4 -d <./initramfs-linux.img.lz4' \
    'zstd -d <initramfs-linux.img.zstd' \
    'cat <initramfs-linux.img'

Note that the command has a --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' argument. This empties the OS file system caches to be closer to start-up conditions: when the computer starts, everything has to be read from the disk as the RAM is basically empty. Without this --prepare argument, we get much shorter times, e.g. 45ms for lz4.

Here are the results:

CommandMean [ms]Min [ms]Max [ms]Relative
lz4 -d <./initramfs-linux.img.lz4137.9 ± 13.5122.8157.41.00
zstd -d <initramfs-linux.img.zstd164.9 ± 13.4153.6187.91.20 ± 0.15
cat <initramfs-linux.img175.9 ± 19.0157.9218.81.28 ± 0.19

Lz4 is slightly faster, followed by zstd and no compression at all with cat. If we go back to the sizes table, the trade-off between a smaller image but a slower decompression is clear. Despite a ~30% smaller file size, zstd is still a bit slower to decompress than lz4, while no compression at all is even worse.

Conclusion

The above results are based on runs on a particular machine. As mentioned different machines will yield different results, depending on the relative performance of the disk and the CPU. It’s also a pretty small improvement in the grand scheme of things: only a few tens of milliseconds on a process that takes a couple seconds. But I found it to be a nice example of how compression can make things faster, compared to no compression at all, because CPU nowadays are so fast.


Appendix: Recording of the hyperfine Run

This was done on a different run from the table above, as running the benchmark through Asciinema is sometimes a bit less stable):


EDITS:


  1. The conclusions will in all likelihood change depending on the machine, namely the relative performance of the CPU and the disk. ↩︎