Oct 14 2013

zpool-zones-vio.d - ZFS statistics per zpool and per zone

Another script, which provides a missing feature to Oracle Solaris 10/11: ZFS statistics per zpool and per zone.

The ZFS in Oracle Solaris 10 and 11 is still not completely ready for a single pool setup, for some applications it makes sense to use multiple pools. In general, if one workload stresses the single zpool too much, the performance of the other workloads can be degraded. Therefore it’s still best practice to use multiple pools, for example the “Configuring ZFS for an Oracle Database” white paper suggests to use at least an own pool for the redo-logs.

The example setup

According to the white paper we use separate pools for the different workloads, an own pool for the oracle data files and an own one for the redo-logs. As we like to run more than one database on a server, the overhead would be significant if we create an own zpool for every database. Therefore we simple put every database in an own Solaris Zone (aka Container) with dedicated ZFS filesystems.

# zfs list -o name,mountpoint
NAME                 MOUNTPOINT
rpool/zones/zone1    /zones/zone1
rpool/zones/zone2    /zones/zone2
rpool/zones/zone2    /zones/zone3
data_pool/zone1      /zones/zone1/root/oradata
data_pool/zone2      /zones/zone2/root/oradata
data_pool/zone3      /zones/zone3/root/oradata
redo_pool/zone1      /zones/zone1/root/oralogs
redo_pool/zone2      /zones/zone2/root/oralogs
redo_pool/zone3      /zones/zone3/root/oralogs

The issue:

Imagine, the I/O performance of the database in zone2 is not as expected. You check with “iostat” and “zpool iostat” and you identify a very high load on the “data_pool” zpool. There are many different options to solve this issue, but to fully understand the problem, you very likely want to know which database is causing the high I/O. But how to identify the evil database?

The solution:

If you use a smart filesystem layout (e.g. own FS per zone per database per zpool) you can use the “fsstat” utility, to get an idea of how much I/O is handled on which filesystem.

# fsstat -i `mount -p | grep zfs | cut -d' ' -f3 | tr '\n' ' '` 10
 read read  write write rddir rddir rwlock rwulock
  ops bytes   ops bytes   ops bytes    ops     ops
 344K  415M 12.7K 41.9M 9.22K 2.40M   366K    366K /
3.18K 28.4M   850  495K 2.90K 4.65M  6.91K   6.91K /zones/zone1/root/oradata
...

In Solaris 11 the first awareness for zones was added, with “fsstat -i -Z zfs “, but it provides only aggregated metrics for all ZFS filesystems of a zone without information of the involved zpools.

If fsstat can’t help you, it’s more complicated as it should be. Therefore I wrote the following DTrace script:

#./zpool-zones-vio.d 10s

zpool          zonename reads    reads(KB)      writes  writes(KB)
-----          -------- -----    ---------      ------  ----------
data_pool      zone3        0            0          6           51
redo_pool      zone3        0            0          6           51
redo_pool      zone2        0            0         14          128
redo_pool      zone1        0            0         26          297
rpool          zone2        1          142          0            0
rpool          zone3        2          174          0            0
rpool          zone1        3          316          2            4
data_pool      zone2        5          112        120          640
data_pool      zone1     2601      1468256        500         4980

The script uses the syscall-provider, therefore the absolute values are comparable to the fsstat utility, but not to “iostat”. E.g. I/O which is only handled by the ZFS ARC cache is also counted.

Script: zpool-zones-vio.d

Non-Oracle ZFS distributions:

Illumos based systems like SmartOS already offer a feature called “I/O Throttle” to limit the I/O caused by a zone, which tries to solve the noisy-neighbor problem. But a similar feature is not yet available for Oracle Solaris.

@mzachh's Weblog

notes about technology, life and the world

zpool-zones-vio.d - ZFS statistics per zpool and per zone

The example setup

The issue:

The solution:

Non-Oracle ZFS distributions: