Proxmox VE: ZFS primarycache=all or metadata

Awhile back, we benchmark our Proxmox infrastructure in various ways mainly to experiment and to get the maximum out of the hardware we had. We did write two articles on the subject to share our result and conclusion, but we did not take the time to share all our result. In an initiative to continue the series, this article will share our finding regarding a ZFS tuning parameter that has a good enough impact on your Proxmox infrastructure. The parameters in question is the primarycache option. It’s not available in the Proxmox GUI. You must use CLI to change the value and you may configure it per ZFS volume.

Here what the ZFS manual have to say about this option:

primarycache=all | none | metadata

Controls what is cached in the primary cache (ARC). If this property is set
to all, then both user data and metadata is cached. If this property is set
to none, then neither user data nor  metadata is cached. If this property is
set to metadata, then only metadata is cached. The default value is all.

Out of this description, one would think caching is better and we must enable it. Wrong. In virtual machine, if you give it enough memory, the guest OS is already caching the file system data. The guest OS can also make better decision regarding what need to be cached since it’s closer to the application. Effectively, enabling the ZFS primarycache for virtual machine is not useful because it creates two caches, one in the guest OS and another in the host OS. With this solution, it’s highly possible to have the same data stored twice in memory. People may argue, the ARC (adaptive replacement cache) as better algorithms for caching, but it’s a waste, because the guest OS doesn’t have direct access to the ARC.

As for LXC, it’s a bit different. LXC does have direct access to the ARC. The performance boost provided by the primarycache highly depends on your workload. One would think primarycache=all for LXC should be beneficial. With our benchmark we observe different results. To check if the primarycache=all provide benefit for your workload, the best it to test it and use various ARC statistics to verify if the ARC is in fact used or not. Have a look at: /usr/sbin/arcstat.py and /usr/sbin/arc_summary.py.

To change this option, you must identify the right zvol to be updated.

$ sudo zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
rpool                          111G   369G   140K  /rpool
rpool/ROOT                    30.9G   369G   140K  /rpool/ROOT
rpool/ROOT/pve-1              30.9G   369G  30.9G  /
rpool/data                    70.4G   369G  6.98G  /rpool/data
rpool/data/subvol-116-disk-1  2.14G  5.86G  2.14G  /rpool/data/subvol-116-disk-1
rpool/data/subvol-117-disk-1  2.15G  5.85G  2.15G  /rpool/data/subvol-117-disk-1
rpool/data/subvol-120-disk-1  2.14G  5.86G  2.14G  /rpool/data/subvol-120-disk-1
rpool/data/subvol-125-disk-1  3.37G  28.6G  3.37G  /rpool/data/subvol-125-disk-1
rpool/data/vm-112-disk-1      21.5G   369G  21.0G  -
rpool/data/vm-114-disk-1      8.02G   369G  8.02G  -
rpool/data/vm-119-disk-1      16.9G   369G  16.4G  -
rpool/data/vm-121-disk-1      1.96G   369G  1.96G  -
rpool/data/vm-121-disk-2      5.57M   369G  5.57M  -
rpool/data/vm-122-disk-1      1.45G   369G  1.45G  -
rpool/data/vm-123-disk-1      1.46G   369G  1.46G  -
rpool/data/vm-124-disk-1      1.46G   369G  1.46G  -
rpool/subvol-108-disk-1       1.03G  7.13G   893M  /rpool/subvol-108-disk-1
rpool/swap                    8.50G   375G  2.77G  -

In our environment, rpool/data is our storage for Proxmox virtual machine and LXC. If you want to change this option for all your environment, you may set the option on it. Otherwise, you may choose to only change the option on a specific VM by changing the value for a specific zvol.

$ sudo zfs get primarycache 
NAME                            PROPERTY      VALUE         SOURCE
rpool                           primarycache  all           default
rpool/ROOT                      primarycache  all           default
rpool/ROOT/pve-1                primarycache  all           default
rpool/data                      primarycache  metadata      local
rpool/data/subvol-116-disk-1    primarycache  metadata      inherited from rpool/data
rpool/data/subvol-117-disk-1    primarycache  metadata      inherited from rpool/data
rpool/data/subvol-120-disk-1    primarycache  metadata      inherited from rpool/data
rpool/data/subvol-125-disk-1    primarycache  all           local
rpool/data/vm-112-disk-1        primarycache  metadata      inherited from rpool/data
rpool/data/vm-114-disk-1        primarycache  metadata      inherited from rpool/data
rpool/data/vm-119-disk-1        primarycache  metadata      inherited from rpool/data
rpool/data/vm-121-disk-1        primarycache  all           local
rpool/data/vm-121-disk-2        primarycache  metadata      inherited from rpool/data
rpool/data/vm-122-disk-1        primarycache  metadata      inherited from rpool/data
rpool/data/vm-123-disk-1        primarycache  metadata      inherited from rpool/data
rpool/data/vm-124-disk-1        primarycache  metadata      inherited from rpool/data
rpool/subvol-108-disk-1         primarycache  all           default
rpool/swap                      primarycache  metadata      local

sudo zfs set primarycache=metadata rpool/data/vm-112-disk-1

Results

**FS-Mark - 1000 Files, 1MB Size** ( More is better) * LXC primarycache=all: 414.77 Files/s * LXC primarycache=metadata: 424.07 Files/s * KVM primarycache=all: 216.53 Files/s * KVM primarycache=metadata: 215.8 Files/s {: .barchart} With this test we don't see a big different between the two options. Still it's enough to showcase the benefit of using primarycache=metadata for LXC.
**Threaded I/O Tester - 64MB Random Read - 32 Threads** ( More is better) * LXC primarycache=all: 3843.15 MB/s * LXC primarycache=metadata: 8021.03 MB/s * KVM primarycache=all: 3759.52 MB/s * KVM primarycache=metadata: 3854.75 MB/s {: .barchart} With this test we clearly see how LXC can benefit from setting `primarycache` to `metadata`. With KVM on the other end we see little to no benefit.
**Threaded I/O Tester - 64MB Random Write - 32 Threads** ( More is better) * LXC primarycache=all: 2433.1 MB/s * LXC primarycache=metadata: 4493.64 MB/s * KVM primarycache=all: 1465.09 MB/s * KVM primarycache=metadata: 911.77 MB/s {: .barchart} Setting `primarycache` to `metadata` for LXC is providing better throughput because the OS doesn't have to waste time to store data in the cache. On the other hand, KVM result is puzzling. Performing better with `primarycache=all`.
**Gzip Compression - 2GB File Compression** ( Less is better) * LXC primarycache=all: 22.32 sec * LXC primarycache=metadata: 22.04 sec * KVM primarycache=all: 22.61 sec * KVM primarycache=metadata: 22.16 sec {: .barchart} This test is not conclusive. The difference in results is not significant.
**Apache Benchmark - Static Web Page Serving** ( More is better) * LXC primarycache=all: 4393.59 request/sec * LXC primarycache=metadata: 4356.66 request/sec * KVM primarycache=all: 4835.33 request/sec * KVM primarycache=metadata: 4935.08 request/sec {: .barchart} This test is not conclusive. The difference in results is not significant.

Conclusion

As you can see in the results, the primarycache option does have impact on the performance but not for every workload. In some test, we don't see any differences ! While in other tests, it provides more than 200% boost.

With all this information, you might be lost about whether it’s good or not to enable the primarycache and which option is better for you. Here, as a rule of doom: set all VM and LXC to primarycache=metadata and for very, very specific workload, set it to primarycache=all.

With this settings, your system is not wasting any memmory for the ARC and that memory can be used for something else like more memory for your VM.