Facter is Puppet’s component for discovering information about the hardware and the operating system where the Puppet agents runs. The many auto-discovered variables (“facts”) are convenient, but facter is also a potential source for slow puppet agent executions. This post shows some tricks for troubleshooting the performance of facter.
With the simple unix utility time
, you can measure the full execution time of facter
. For example to know if an agent runs on a virtual hardware, the is_virtual
“fact” is populated:
# time facter is_virtual false real 0m4.008s user 0m0.853s sys 0m2.594s
In this example we see facter needs four seconds to discover is_virtual=false
. But what happens in this four long seconds? The option --timing
shows how long each fact needs to get discovered. It also shows that facter uses several other facts to resolve is_virtual
, because depending on these facts other CLI commands are used for the discovery.
# facter --timing is_virtual kernel: 7.74ms operatingsystem: 18.32ms osfamily: 20.16ms macaddress: 77.64ms hardwareisa: 19.04ms hardwaremodel: 29.89ms architecture: 30.20ms virtual: 0.69ms prtdiag: 1062.60ms virtual: 1094.83ms virtual: 4.07ms is_virtual: 1130.92ms false
From this output we see that more than ten facts are resolved but only the prtdiag
and virtual
fact need longer than one second for discovery. This information is already helpful for a Puppet facter developer. But for a regular user, who only likes to understand why facter runs slower on some hardware types, it is likely not simple enough.
Facter basically just parses the output of CLI commands, so if such a command runs slow, whole facter runs slow. The performance of the ruby code which calls these commands, usually does not have a big impact on the overall facter performance.
How to measure the execution time of the CLI commands which are called by facter? One option would be to improve the --timing
feature. But as I am on a platform which ships with Dtrace (Solaris), it is easier to write a simple Dtrace script. The following script measures the execution time of all binaries which are called by the facter ruby process:
1 | #!/usr/sbin/dtrace -s |
You need to call the Dtrace script as followed:
# dtrace -s facter-timing.d -c '/usr/local/bin/facter --timing is_virtual' ... = Dtrace statistics = Ruby facter PID: 9195 PID runtime (ms) command --- ------------ ------- 9200 405 netstat -p -f inet 9233 58 /sbin/ifconfig -a 9196 1 /usr/bin/uname -s 9234 128 /usr/perl5/bin/perl /usr/bin/kstat cpu_info 9198 57 /sbin/ifconfig -a 9219 1 /usr/bin/uname -p 9238 47 /usr/sbin/zoneadm list -cp 9243 1 /usr/bin/uname -m 9247 12 /sbin/zonename 9248 1059 /usr/platform/SUNW,SPARC-Enterprise/sbin/prtdiag 9197 1 /usr/bin/uname -v 9231 16 sh -c /usr/sbin/prtconf 2>/dev/null 9221 1088 sh -c /usr/sbin/prtdiag 2>/dev/null 9201 58 /sbin/ifconfig -a 9207 2 /usr/sbin/virtinfo -ap
In this output a sysadmin can easily see which commands are slow to run. For example prtdiag
needs more than one second and also netstat -p -f inet
needs almost 0.5 seconds. Actually I wrote the script when we faced the bug that virtinfo -ap
needed more than twenty seconds for a simple execution. This script identified the issue quickly.