Centralizing logs is important as soon you have more than 2 servers. In my environment the bare metal is monitored with Net-SNMP and my services are deployed as containers with Docker. All system logs are sent to a Graylog2 instance and I quickly noticed a few ugly entries caused by snmpd.

Cannot statfs /run/docker/netns/...: Permission denied

You will notice a few of them. First approach try to increase the logging level in /etc/default/snmpd from SNMP daemon with

SNMPDOPTS='-Ls3d -Lf /dev/null -u snmp -g snmp -I -smux,mteTrigger,mteTriggerConf -p /run/snmpd.pid'

The man page from Net-SNMP described the logging and I've increased with -Ls3d the level to "Error" instead of "Warning", but it didn't help. I researched in the web and found this topic in Red Hats Bugzilla.

It turns out snmpd is reading /proc/mount and runs statfs and logs an error. One of the authors in the comment section found a solution to use rsyslog filtering this type of message with:

if $programname == 'snmpd' and $msg contains 'statfs' then {

The result is now a much cleaner log with less garbage.

Happy Logging

As most of us noticed a few companies changed our perspective how to develop software and deploy them as a service. There are quite a few changes between selling every year a box with 10 CD's and develop and deliver your software as a service. This article is a collection of thoughts and ideas I had and wanted to be written.

Who cares about a version number?

User give a shit about version numbers anymore, all what matters needs to be focused on the user. Great user experience, functionality and a good "Effort-to-Outcome" ratio to solve your problems will make your software successful.

Usability improvements, features and fixes are delivered immediately and this is where all the fuzz about continuous delivery and the devops culture kicks in.

Pets and Cattle

The virtualisation technology forced hardware manufacturer to change their mindsets to make their boxes to behave as good cattle instead of being a pet. The same will happen to Linux distributions and configuration management tools with the fuzz about containers, I believe they didn't really noticed yet. You want hardware as commodity, you need a Linux kernel as commodity and the diversity and history of Linux distributions are in your way.

All the ugly ifdefs in configuration management tools just make your service run on a specific distributions using yum, apt, or apk and the nasty glue you have to write to configure your service to run in a container is still painful. Applications aren't often built to run in such environments and you have to hack a lot of stuff - but this is a whole different story.

Monitoring is important

Everybody tells you monitoring is an important thing. You can only improve what you measure and you need to know where things go down the hill.

Current monitoring tools felt behind with todays needs. Most of them allow you only to think in terms of bare metal boxes like hosts and IP addresses. Some tools do only one part, the performance management XOR fault management and you need both. In case you have two, you have to maintain and glue two tools together and you want alerting - and you don't want to maintain them twice.

Monitoring needs to change

Monitoring tools need to be changed to be part of the software development and service deployment process.

Most of them are built by people with operation background and not software development background. We need more software development background in monitoring tools. The operation background is often quite good.

We should make monitoring a part in our Test suits. Why not define an "Operation Test" behind an "Integration Test" and let it run in your monitoring tool? Additionally monitoring people should make clear for the user what is the difference between "Performance Management" and "Application Profiling".

Monitoring people should adapt the terms like whitebox and blackbox testing for operational services. For examples when you test the error code of a landing page for a web application it is a blackbox test. When you measure internal application specific entities with JMX you have white box test.

Fight against Alarm Fatigue

Monitoring applications tend to overmonitor your environment by default. You measure a lot, it tells you a lot but you oversee the important things in all the noise. The signal to noise ratio is too high and people become alarm fatigue. Rule #1 in alerting: "Notify only someone when human interaction is really necessary."

Applications and Monitoring

With deploying services in containers the whole idea of provisioning need to be changed. Monitoring tools should allow you to model "Application Service" with associated performance- and availability metrics. The alerting should also be possible on those "Application Services". Performance metrics and operational tests are driven by high level services mostly through ReST API's. Containers will come and go providing resources to this "Application Service". They can't no longer be treated as long living hosts with a static assigned IP address.

Monitoring need to be more intelligent

When talking about intelligence, everybody is thinking about Artificial Intelligence. It is much simpler in monitoring, cause they are ridicolous stupid at the moment and you don't have to throw AI against the problem. Diagnose from bottom up, low complexity to higher complexity, which means also cheap to expensive in the sense of needed hardware and network resources. We want to monitor high level services, a monitoring tool can help diagnosing a problem by himself and can provide a lot of useful information, for example:

We test a 200 OK code on https://mycloudy.webapp.acme.com for 200 OK with a timeout of 2 seconds.

Instead of just giving a "Service Down", the monitoring tool can diagnose itself with a few cheap and simple tests. Diagnose the problem from the perspective of the monitoring system to give a NOC guy an overview what went wrong and safe him time. Just an example for the test above, was the connection refused or was the HTTP error code just something else than 200 OK?

Connection refused

  1. Can IPv4/IPv6 addresses be looked up by the host name cloudy.webapp.acme.com?
  2. Can the IPv4/IPv6 addresses be reached over ICMP?
  3. If not was is the trace route output for IPv4/IPv6 addresses
  4. If possible give me a link to logs from last hop nodes in the time area +/- service polling interval
  5. When they can be reached is the TCP port 443 port open
  6. If not give me a link to warning+ logs from the web server in the time area +/- service polling interval

Not 200 OK

  1. Use the resolved IPv4/IPv6 address of the web server and give me a link to warning+ logs from the web server in time area +/- service polling interval

This is not something where you need a rocket scientist for.

Most of the things are configured in permanent monitoring, e.g. ICMP or DNS lookups, but are mostly just necessary when you need to diagnose a problem. You only care about them, when a high level application service fails. You can do a similar thing with response times. You really care about your application response time for a longer period of time. Just when it went through the roof you have to ask immediately, was the network path slow (ICMP)? Was the name lookup slow(DNS)? Was the web server response slow(HTTP)?

During work building Docker executables, I ran in an interesting corner case. Fortunately the Docker IRC channel helped me to investigate with special credits to Ravensoul.

When you build a container as an executable you can use the ENTRYPOINT for your binary to execute and CMD as a default overwritable argument. In most cases the CMD is the --help argument to provide a useful default behavior in case you just run the container without anything specified.

In my case I've built a Ruby based executable and for the reason I need the environment variables, I've used as ENTRYPOINT the bash -c <command> command and used the CMD default argument --help like this:

ENTRYPOINT ["/bin/bash", "-c", "/path/to/myRuby"]

CMD ["--help"]

I've noticed the --help argument was not used when you just run the container. To verify the problem and isolate the environment, I've created a small example for investigation:

FROM alpine

ENTRYPOINT ["/bin/bash", "-c", "ps"]

CMD ["--help"]

When I ran this container I've noticed the ps command is executed but not the argument --help. It turned out the problem is /bin/bash -c usage as ENTRYPOINT. When you execute /bin/bash -c 'echo ${0}' myFirstArgument you will notice the myFirstArgument becomes ${0} which is the name of the script itself.

man /bin/bash:

If there are arguments after the string, they are assigned to the positional parameters, starting with $0

To get around this problem, I've wrapped my command in an docker-entrypoint.sh and used ${@} to pass all arguments which fixed my problem.

Happy dockering.

I’m using Mac OS X with iterm2, oh-my-zsh and spend 75% of my time in those terminals. It is totally annoying to me if I connect to a DHCP network and it screws up my hostname. Especially when I'm used to looking at the prompt which tells me the host I'm connected to.


It is possible to fix your computer name for several things using the scutil command which requires administration permissions. I've found a link to the Mac OS X Server Worksheet which explains a few things in more detail. Here is what I did to prevent my computer changing the host name.

User Friendly Name, showed in Sharing Preference Panel

sudo scutil --set ComputerName blinky

SSH and Remote login

sudo scutil --set HostName blinky

Name for Bonjour, e.g. Airdrop

sudo scutil --set LocalHostName blinky

In hope this helps and hope I'll find the page again when I forgot how to do it :)

gl & hf

I ran in some trouble with my Vodafone Easybox 904 xDSL. Even with 2Ghz and 5Ghz WLAN I had regularly drops. Had to turn on / off the WLAN on the device or had to reboot it to reconnect. Otherwise the VDSL line reguarly got disconnected, also replacing the Easybox from Vodafone didn't helped, so I bought a Zyxel VMG1312-B30A.

Zyxel with Vodafon VDSL 50

Search through the interwebs and took me a while to figure out what settings are required. In case you want doing the same, I want to share my settings to safe you some time:

Broadband Settings

  • Type: ADSL/VDSL over PTM
  • Mode: Routing
  • Encapsulation: PPPoE
  • IPv6/IPv4 Mode: IPv4 only sigh
  • PPP-User: vodafone-vdsl.komplett/vb<number>
  • PPP-Pass: your-pass-here
  • PPPoE-Service-Name: VODAFONE
  • VLAN: active
  • 802.1p: 1
  • 802.1q: 7 (7: Telekom line, 132 for Vodafone line)
  • MTU: 1492

Extended xDSL Settings

  • ADSL over PTM: deactivated
  • Annex J: activated
  • PhyR US:deactivated
  • PhyR DS: activated

Connection went online and speed test gives me 50 MBit downstream and 10 Mbit upstream. I run a tinc VPN to have a flat management network for all my servers and runs without any trouble so fare. Just to mention, I don't use the Vodafone voice functionality.

So far gl & hf