Stepping back from the bleeding edge
As a consultant I often get called in to help organisations improve the performance of their applications. The first requirement is usually to install some monitoring to see what is going on. Which makes me wonder: why isn’t monitoring part of the original deployment? Or even the original design? When we are designing our next great system we want to keep abreast of the latest developments. We know that it is tempting to go for the latest technology, be it virtualisation, containers, micro-services, cloud or on-premises or hybrid, server-less or streaming (insert your favourite here). But did you know that the tools vendors (and open-source contributors) are playing a continual game of catch-up? They typically wait to see what is going to catch on before committing to support it. It is true that they do a pretty good job but at the beginning of the life-time of an application, support may not be good. What about the life-time of a typical application? How long does it last before it crumbles under the weight of countless changes or the market just moves on? I would reckon about five years but it might only be two or three. The point is that you are probably going to want to maintain business-as-usual in this application for far longer than you took to originally build it. And that usually requires monitoring so that you can detect changes in performance between releases or as the result of traffic surges. Probably by the end of the life-time the monitoring will be pretty stable (unless you bet on the wrong horse and chose a dead-end technology route). But you will need monitoring from the beginning. So I would contend that monitoring support should be one of the considerations when designing the system. Just as you add logging you should be thinking about the telemetry required to properly maintain the application.
But the story doesn’t end there. What if you step back from the cutting or bleeding edge and choose a “safe” technology such as Java or .NET? In principle, existing monitoring tools (we are a partner for AppDynamics) provide excellent support for following user journeys through the application (true APM in my book, not just CPU and memory usage). Even here frameworks and libraries continue to evolve, usually becoming ever more asynchronous in nature. Again the tools vendors usually do a pretty good job keeping up to date here, but it is worth checking before you commit to your design.
So you are using a well-established framework in a well-established language. Problem solved? You can defer a tool decision until after the application is deployed? Not quite. It is perfectly possible to write your application in such a way that makes it difficult to track those user journeys. On two recent occasions I have seen incoming URLs that are used to identify user journeys contain a step number that is some randomly generated sequence number that changes on each software release. Deep in the bowels of the application, it is able to interpret this magic step number and know what to do. But without some helper functions to translate them into something more meaningful the monitoring tool cannot make use of them. It was the second of these two occasions that prompted me to write this article.
To coin a phrase, have you designed your system with “monitorability” in mind?