To monitor your systems you rely heavily on SNMP, it gives out of the box a lot of possibilities getting important performance and status information.

The main topic security is often not considered. SNMP version 1 and 2c transmit everything in plain text over the wire. There is also no user, password authentication method, just a shared community string which gives access to the information. To address these problems SNMP v3 was introduced.

The Linux Net-SNMP agent supports SNMP v3 and OpenNMS does as well, so nothing prevents us to use encryption and user authentication.

WARNING: I assume Net-SNMP uses SHA-1 which is secure anymore. As far I know today, there is no implementation for Net-SNMP available which supports SHA-2 with a 256-bit hash.

Nevertheless here is the way to configure SNMP v3. It is still better than sending everything over the wire in plain text. In critical environments, I would definitely consider adding mechanisms to isolate and protect the management network from the rest of the world on network layers to reduce the attack vector.

Make your Net-SNMP configuration modular

Today, people running configuration management tools rolling out configurations to a lot of systems. Net-SNMP gives you the possibility to use an include drop-in folder to extend the default configuration, which is very handy to include device dependent configuration snippets.

All you have to do is to add the following line in your snmpd.conf

includeDir /etc/snmp/conf.d

All files ending with .conf will now be added to your Net-SNMP configuration. This makes it using configuration management tools to add device dependent disk, process or log monitoring directives without mangling one large snmpd.conf with variables.

How to configure Net-SNMP with SNMP v3

The first step, create a user with password and tell the agent what methods for encryption and signature should be used with:

createUser monitor SHA 0p3nnm5423 AES opennmsopennms rouser monitor priv .1.3.6.1.2.1

The command creates a user named monitor and uses SHA as Message Authentication Code. For encryption you have the choice between DES and AES , I would recommend the newer AES encryption method. I can recommend using something like apg to create better passwords.

Once you added the configuration you have to restart the Net-SNMP daemon and you can test it with the following command:

snmpget -v 3 -u monitor -l authPriv -a SHA -A 0p3nnm5423 -x AES -X opennmsopennms localhost .1.3.6.1.2.1.1.6.0

You should be able to get the system location. Next, you can configure OpenNMS to use SNMP v3 for your IP address or a whole range in the Web UI by going to "Admin -> Configure SNMP Community by IP".

That's it – happy monitoring.

Centralizing logs is important as soon you have more than 2 servers. In my environment the bare metal is monitored with Net-SNMP and my services are deployed as containers with Docker. All system logs are sent to a Graylog2 instance and I quickly noticed a few ugly entries caused by snmpd.

Cannot statfs /run/docker/netns/...: Permission denied

You will notice a few of them. First approach try to increase the logging level in /etc/default/snmpd from SNMP daemon with

SNMPDOPTS='-Ls3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

The man page from Net-SNMP described the logging and I've increased with -Ls3d the level to "Error" instead of "Warning", but it didn't help. I researched in the web and found this topic in Red Hats Bugzilla.

It turns out snmpd is reading /proc/mount and runs statfs and logs an error. One of the authors in the comment section found a solution to use rsyslog filtering this type of message with:

if $programname == 'snmpd' and $msg contains 'statfs' then {
    stop
}

The result is now a much cleaner log with less garbage.

Happy Logging

As most of us noticed a few companies changed our perspective how to develop software and deploy them as a service. There are quite a few changes between selling every year a box with 10 CD's and develop and deliver your software as a service. This article is a collection of thoughts and ideas I had and wanted to be written.

Who cares about a version number?

User give a shit about version numbers anymore, all what matters needs to be focused on the user. Great user experience, functionality and a good "Effort-to-Outcome" ratio to solve your problems will make your software successful.

Usability improvements, features and fixes are delivered immediately and this is where all the fuzz about continuous delivery and the devops culture kicks in.

Pets and Cattle

The virtualisation technology forced hardware manufacturer to change their mindsets to make their boxes to behave as good cattle instead of being a pet. The same will happen to Linux distributions and configuration management tools with the fuzz about containers, I believe they didn't really noticed yet. You want hardware as commodity, you need a Linux kernel as commodity and the diversity and history of Linux distributions are in your way.

All the ugly ifdefs in configuration management tools just make your service run on a specific distributions using yum, apt, or apk and the nasty glue you have to write to configure your service to run in a container is still painful. Applications aren't often built to run in such environments and you have to hack a lot of stuff - but this is a whole different story.

Monitoring is important

Everybody tells you monitoring is an important thing. You can only improve what you measure and you need to know where things go down the hill.

Current monitoring tools felt behind with todays needs. Most of them allow you only to think in terms of bare metal boxes like hosts and IP addresses. Some tools do only one part, the performance management XOR fault management and you need both. In case you have two, you have to maintain and glue two tools together and you want alerting - and you don't want to maintain them twice.

Monitoring needs to change

Monitoring tools need to be changed to be part of the software development and service deployment process.

Most of them are built by people with operation background and not software development background. We need more software development background in monitoring tools. The operation background is often quite good.

We should make monitoring a part in our Test suits. Why not define an "Operation Test" behind an "Integration Test" and let it run in your monitoring tool? Additionally monitoring people should make clear for the user what is the difference between "Performance Management" and "Application Profiling".

Monitoring people should adapt the terms like whitebox and blackbox testing for operational services. For examples when you test the error code of a landing page for a web application it is a blackbox test. When you measure internal application specific entities with JMX you have white box test.

Fight against Alarm Fatigue

Monitoring applications tend to overmonitor your environment by default. You measure a lot, it tells you a lot but you oversee the important things in all the noise. The signal to noise ratio is too high and people become alarm fatigue. Rule #1 in alerting: "Notify only someone when human interaction is really necessary."

Applications and Monitoring

With deploying services in containers the whole idea of provisioning need to be changed. Monitoring tools should allow you to model "Application Service" with associated performance- and availability metrics. The alerting should also be possible on those "Application Services". Performance metrics and operational tests are driven by high level services mostly through ReST API's. Containers will come and go providing resources to this "Application Service". They can't no longer be treated as long living hosts with a static assigned IP address.

Monitoring need to be more intelligent

When talking about intelligence, everybody is thinking about Artificial Intelligence. It is much simpler in monitoring, cause they are ridicolous stupid at the moment and you don't have to throw AI against the problem. Diagnose from bottom up, low complexity to higher complexity, which means also cheap to expensive in the sense of needed hardware and network resources. We want to monitor high level services, a monitoring tool can help diagnosing a problem by himself and can provide a lot of useful information, for example:

We test a 200 OK code on https://mycloudy.webapp.acme.com for 200 OK with a timeout of 2 seconds.

Instead of just giving a "Service Down", the monitoring tool can diagnose itself with a few cheap and simple tests. Diagnose the problem from the perspective of the monitoring system to give a NOC guy an overview what went wrong and safe him time. Just an example for the test above, was the connection refused or was the HTTP error code just something else than 200 OK?

Connection refused

  1. Can IPv4/IPv6 addresses be looked up by the host name cloudy.webapp.acme.com?
  2. Can the IPv4/IPv6 addresses be reached over ICMP?
  3. If not was is the trace route output for IPv4/IPv6 addresses
  4. If possible give me a link to logs from last hop nodes in the time area +/- service polling interval
  5. When they can be reached is the TCP port 443 port open
  6. If not give me a link to warning+ logs from the web server in the time area +/- service polling interval

Not 200 OK

  1. Use the resolved IPv4/IPv6 address of the web server and give me a link to warning+ logs from the web server in time area +/- service polling interval

This is not something where you need a rocket scientist for.

Most of the things are configured in permanent monitoring, e.g. ICMP or DNS lookups, but are mostly just necessary when you need to diagnose a problem. You only care about them, when a high level application service fails. You can do a similar thing with response times. You really care about your application response time for a longer period of time. Just when it went through the roof you have to ask immediately, was the network path slow (ICMP)? Was the name lookup slow(DNS)? Was the web server response slow(HTTP)?

During work building Docker executables, I ran in an interesting corner case. Fortunately the Docker IRC channel helped me to investigate with special credits to Ravensoul.

When you build a container as an executable you can use the ENTRYPOINT for your binary to execute and CMD as a default overwritable argument. In most cases the CMD is the --help argument to provide a useful default behavior in case you just run the container without anything specified.

In my case I've built a Ruby based executable and for the reason I need the environment variables, I've used as ENTRYPOINT the bash -c <command> command and used the CMD default argument --help like this:

ENTRYPOINT ["/bin/bash", "-c", "/path/to/myRuby"]

CMD ["--help"]

I've noticed the --help argument was not used when you just run the container. To verify the problem and isolate the environment, I've created a small example for investigation:

FROM alpine

ENTRYPOINT ["/bin/bash", "-c", "ps"]

CMD ["--help"]

When I ran this container I've noticed the ps command is executed but not the argument --help. It turned out the problem is /bin/bash -c usage as ENTRYPOINT. When you execute /bin/bash -c 'echo ${0}' myFirstArgument you will notice the myFirstArgument becomes ${0} which is the name of the script itself.

man /bin/bash:

If there are arguments after the string, they are assigned to the positional parameters, starting with $0

To get around this problem, I've wrapped my command in an docker-entrypoint.sh and used ${@} to pass all arguments which fixed my problem.

Happy dockering.

I’m using Mac OS X with iterm2, oh-my-zsh and spend 75% of my time in those terminals. It is totally annoying to me if I connect to a DHCP network and it screws up my hostname. Especially when I'm used to looking at the prompt which tells me the host I'm connected to.

term2

It is possible to fix your computer name for several things using the scutil command which requires administration permissions. I've found a link to the Mac OS X Server Worksheet which explains a few things in more detail. Here is what I did to prevent my computer changing the host name.

User Friendly Name, showed in Sharing Preference Panel

sudo scutil --set ComputerName blinky

SSH and Remote login

sudo scutil --set HostName blinky

Name for Bonjour, e.g. Airdrop

sudo scutil --set LocalHostName blinky

In hope this helps and hope I'll find the page again when I forgot how to do it :)

gl & hf