April 4, 2014

Building Puppet from source is easy


As mentioned in the last post, software dependencies can be a blocker for the adoption of configuration management frameworks. In complex or legacy environments where one has several different and antique operating systems versions, even the installation of this frameworks can be painful.  Easily too painful for a time saving tool.

About a year ago, I was trying to install the Puppet agent on Solaris 10, Solaris 11 and RHEL 5.x. And it took longer than 5 minutes ...
Basically the most install tutorials require to have a current operating system and/or a connection to the Internet from your target systems. These requirements were just not realistic for my target servers at my work place.

We just didn't want to upgrade the whole data center, before we can install a small piece of ruby software, which maybe helps us in the future. We rather wanted to use the time to write Puppet manifests to automate our tasks, to get some effort savings which we can re-invest in upgrading old operating systems.

After some struggling I tried to compile everything from source and I was surprised how easy it was, also on Solaris. The following build guide works at least for Solaris 10, Solaris 11, RHEL 5/6, CentOS 5/6.

The idea is to build the Puppet stack and all libraries which could cause some problems to an own directory, in this example: /opt/mypuppet. So the Ruby installation of the Puppet agent does not interfere with your systems Ruby.

So if you like to "uninstall" the Puppet agent you can do it easily with:

gist:b1-l1
An upgrade is just an uninstall and re-install.

Prepare build environment


You will need systems with installed developer tools like compiliers, it makes sense to use dedicated systems for that.

Solaris 10

If you use Solaris 10, you very likely customize the distribution heavily. Take care that at least the following packages are installed on the build system:

gist:b1-l2
Additionally some build scripts are only tested with the gnu-tools, you can avoid some trouble if you just make "grep" and "sed" resolve to the gnu versions (e.g. overwrite the path to grep and sed with a symlink to ggrep and gsed)

Solaris 11

With Solaris 11 it's easier:

gist:b1-l2

Centos 6.x

gist:b1-l2

Ruby + libraries

libyaml

gist:b1-l3

Ruby

Always check the supported Ruby versions in the documentation. Since Puppet 3.2 also Ruby 2.0 should be supported.

gist:b1-l3

Puppet, facter and hiera

Facter

gist:b1-l4

Hiera

gist:b1-l4

Puppet

gist:b1-l4

As long as the bug PUP-1567 is not fixed, you need to apply the following patch, to make puppet apply work with the alternate installation directory.

gist:puppet-masterless.patch

Packaging


To ship the build agent to the target systems, pack the files e.g. in a RPM, Solaris package or just a tar-archive:

gist:b1-l5

Outlook


Building from source has definitely some disadvantages, e.g. the building effort. If the pre-build packages from your OS vendor or Puppetlabs are good for you, USE them.

Although we are building the agent now for almost a year from source, we always had the plan to migrate to pre-build packages. Especially after the Solaris 11.2 release, which will ship with full Puppet integration.

Also the excellent Pro Puppet book has a good coverage of the many ways to install Puppet, which you should read to save a lot of time.

March 23, 2014

Configuration Management Framworks - Adoption Blockers

Frameworks like Puppet, Chef, Cfengine, or Salt are now available for some years. Their community is huge and „Automation“ seems to be the answer to almost everything in IT.

Many say: If you do system administration for more than one server and you don’t use a configuration management framework, you do it wrong!
I fully agree with this statement. But according my observation, everybody seems to be interested in using such frameworks, however in reality the adoption rate is quite low.
Why is this?

I think there are many possible blockers, but in my opinion especially the introduction phase of these frameworks is for some complex environments and companies just too hard.

Nowadays I mostly use Puppet, but I also started quite late. Mainly because of stupid reasons, for example like the fear of additional software dependencies and the fact that the popular frameworks did not use my favorite scripting language Python. Newer Python-based frameworks like Salt, need Python 2.6 which was not available on all my target servers.

Anyway, it turned out all the blockers for the introduction of these frameworks can be removed, one by one. I try to address some of the biggest blockers of the introduction of the configuration management framework Puppet in the upcoming posts.

October 14, 2013

zpool-zones-vio.d - ZFS statistics per zpool and per zone.

Another script, which provides a missing feature to Oracle Solaris 10/11: ZFS statistics per zpool and per zone.

The ZFS in Oracle Solaris 10 and 11 is still not completely ready for a single pool setup, for some applications it makes sense to use multiple pools. In general, if one workload stresses the single zpool too much, the performance of the other workloads can be degraded. Therefore it's still best practice to use multiple pools, for example the "Configuring ZFS for an Oracle Database" white paper suggests to use at least an own pool for the redo-logs.

The example setup:

According to the white paper we use separate pools for the different workloads, an own pool for the oracle data files and an own one for the redo-logs. As we like to run more than one database on a server, the overhead would be significant if we create an own zpool for every database. Therefore we simple put every database in an own Solaris Zone (aka Container) with dedicated ZFS filesystems.
# zfs list -o name,mountpoint
NAME                 MOUNTPOINT
rpool/zones/zone1    /zones/zone1
rpool/zones/zone2    /zones/zone2
rpool/zones/zone2    /zones/zone3
data_pool/zone1      /zones/zone1/root/oradata
data_pool/zone2      /zones/zone2/root/oradata
data_pool/zone3      /zones/zone3/root/oradata
redo_pool/zone1      /zones/zone1/root/oralogs
redo_pool/zone2      /zones/zone2/root/oralogs
redo_pool/zone3      /zones/zone3/root/oralogs

The issue:

Imagine, the I/O performance of the database in zone2 is not as expected. You check with "iostat" and "zpool iostat" and you identify a very high load on the "data_pool" zpool. There are many different options to solve this issue, but to fully understand the problem, you very likely want to know which database is causing the high I/O. But how to identify the evil database?

The solution:

If you use a smart filesystem layout (e.g. own FS per zone per database per zpool) you can use the "fsstat" utility, to get an idea of how much I/O is handled on which filesystem.
# fsstat -i `mount -p | grep zfs | cut -d' ' -f3 | tr '\n' ' '` 10
 read read  write write rddir rddir rwlock rwulock
  ops bytes   ops bytes   ops bytes    ops     ops
 344K  415M 12.7K 41.9M 9.22K 2.40M   366K    366K /
3.18K 28.4M   850  495K 2.90K 4.65M  6.91K   6.91K /zones/zone1/root/oradata
...
In Solaris 11 the first awareness for zones was added, with "fsstat -i -Z zfs <interval>", but it provides only aggregated metrics for all ZFS filesystems of a zone without information of the involved zpools.
  
If fsstat can't help you, it's more complicated as it should be. Therefore I wrote the following DTrace script:

#./zpool-zones-vio.d 10s

zpool          zonename reads    reads(KB)      writes  writes(KB)
-----          -------- -----    ---------      ------  ----------
data_pool      zone3        0            0          6           51
redo_pool      zone3        0            0          6           51
redo_pool      zone2        0            0         14          128
redo_pool      zone1        0            0         26          297
rpool          zone2        1          142          0            0
rpool          zone3        2          174          0            0
rpool          zone1        3          316          2            4
data_pool      zone2        5          112        120          640
data_pool      zone1     2601      1468256        500         4980

The script uses the syscall-provider, therefore the absolute values are comparable to the fsstat utility, but not to "iostat". E.g. I/O which is only handled by the ZFS ARC cache is also counted.

Script: zpool-zones-vio.d

Non-Oracle ZFS distributions:

Illumos based systems like SmartOS already offer a feature called "I/O Throttle" to limit the I/O caused by a zone, which tries to solve the noisy-neighbor problem. But a similar feature is not yet available for Oracle Solaris.