Foreword
I would like to discuss with you today, with the well-informed eye of an enterprise architect, a recurring source of pain in the IT world: IT debt (aka "technical debt" or "technological debt").
It is a widespread problem in the large companies I have worked for as an architect.
There are many definitions, starting with the origin of the metaphor proposed by Ward Cunningham in 1992, and heated debates about what is and is not IT debt.
I'll give you my own definition: IT debt encompasses all deviations from the technological state of the art, whether deliberate or not, which have a long-term negative impact on IT costs, time to market and business continuity.
In the rest of this article, we'll take a closer look at what IT debt actually means, why it's a problem, and finally, to preserve a happy ending, what solutions can be considered to reduce or even eliminate it.
Firstly, what is IT debt?
To go further than the rather simplistic definition proposed a few lines above, let's materialise what "deviations from the technological state of the art" can be.
Many people make a rather quick shortcut by assimilating IT debt with obsolescence (of software, hardware, frameworks, etc.). This is not wrong, but it is only the tip of the iceberg, and it is rarely the cause of the deepest troubles.
To be complete, we must include in IT debt all the other sources of "malfunctions", from the most visible to the most insidious:
- code debt, which covers all bad development practices (e.g. a hopelessly empty try/catch, documentation worthy of an unfinished novel, the longest method contest, etc.);
- data debt, which includes all the quality deviations of the data hosted in the repositories and other deposits (inconsistencies, desynchronisations, incompleteness, bad formats, etc.);
- skills debt, leading to a loss of control over the solution, and linked to the change of actors and/or the use of exotic technologies or languages, often amplified by the lack of documentation. We can also integrate the documentation debt into the skills debt;
- technical architecture debt, induced by design errors, with an insolent ignorance of patterns that have been proven to be beneficial (loose coupling vs. strong coupling, modular architecture vs. monolithic, resiliency/redundancy vs. SPOF, scalability vs. static architecture, stateless vs. stateful, etc.);
- enterprise architecture debt resulting from bad choices at enterprise architecture level, or even no choice at all (functional redundancy, data overlap, patchwork of point-to-point flows, etc.).
We can also include :
- vulnerabilities, as security debt, even if it is more debatable: they can induce, in the same way as IT debt, a risk on the continuity of the business and important financial consequences (and not only financial :-() ;
- and more generally anything that impacts performance, availability and user experience because at some point it will cause serious problems (task force, loss of customers, drop in sales ...).
Finally, to be exhaustive, we must also include the energy debt, as the design of the solutions can strongly influence the target carbon footprint (e.g. upward and downward scalability, limitation of stored data, efficiency of algorithms, etc.).
If IT products were material, as in construction for example, many of the achievements would be frightening to see! But unfortunately, in this immaterial world, most of the debt is hidden in the hold and not perceptible from the command post.
Good debt or bad debt?
IT debt is surprisingly often the result of conscious decisions and rarely of a lack of skills. For reasons of business urgency (e.g. response to a competing offer) or regulatory reasons (e.g. a new legislation), companies often have no other choice than to favour the 'quick & dirty' solution that IT teams deplore.
In this sense, we can therefore speak of "good debt" since its harmful effects in the short term are largely outweighed by the benefit of having quickly delivered the project.
IT debt, however, goes to the dark side as soon as no plan for its remediation is established: it is swept under the rug, and its resolution is indefinitely postponed, as it is always considered to be less of a priority than "business as usual".
As long as I haven't hit the wall, I don't see the wall...
But after all, in what way is IT debt harmful?
Firstly, if we go back to the origin of the metaphor with financial debt, IT debt is being paid in cash every month, because changes are becoming more and more expensive and time-consuming, tests are becoming more and more cumbersome, and there are more and more incidents in production environment. Added to this are palliative procedures with a heavy reliance on manual actions (" the scoopers "), and a bad experience for customers and users (e.g.: degraded performance and availability).
During an audit mission that I carried out on the core business information system of a large company, I estimated that the IT debt was responsible for an increase of 20 to 25% of the TCO, due to the accumulation of several factors: lower productivity, lack of quality and reliability, inefficiency of the run, increased on-call time.
Another more insidious consequence of IT debt is the exodus of talented people who can no longer cope with maintaining polymorphic monsters and accumulate the frustration of never being listened to. This results in even more IT debt.
And when the IT debt becomes unsustainable, long, costly and risky overhaul projects are launched, resulting in higher overall expenses than if the IT debt had been properly dealt with as it went along.
How did it come to this?
This IT debt syndrome, which can be found in the vast majority of companies, did not appear by chance. It has its origins in four factors:
- the pressure of "business as usual", as it is often a little more expensive and time consuming in the short term to be state of the art (although sometimes this is debatable...);
- it is not easy to make people understand the impact of IT debt, because it is the accumulation of a multitude of bad choices that weighs on the long term. Each individual choice often appears as a small "acceptable" deviation;
- businesses band IT people do not know how to communicate: architects and tech leads often have poor communication skills when it comes to explaining IT debt to their business contacts, and they do not know how to "pick their battles" by fighting all over the place over each deviation, losing credibility and also tenacity;
- It is often neither clearly identified nor qualified in terms of impact.
The result is a form of vicious circle: the IT debt accumulates, its impacts are increasingly significant and the effort to deal with it becomes more and more important. There is then less and less latitude to absorb it, as changes cost more and more, and so the choice of the short term always prevails...
And this vicious circle leads to another, even more problematic one: the accumulation of IT debt leads to a lack of capacity on the part of IT teams to respond effectively to the expectations of the business, and we can even talk about a progressive misalignment between business and IT. This leads to a loss of trust that undermines the image of IT teams in the eyes of business units, and therefore makes it increasingly difficult to argue against or reduce IT debt.
Help, how can we get out of the debt hole?
There are two levels of response to this question:
- The ambitious answer: how to build a remediation plan for IT debt that has accumulated over many years?
- And the very ambitious answer: how can we put ourselves in a position to control our IT debt in the long term, with the implementation of a "zero debt" strategy?
In both cases, the prerequisite is to know how to quantify and qualify IT debt factually. And this is far from a simple exercise (otherwise we wouldn't be where we are today). At Sopra Steria Next, we have capitalised within our Enterprise Architecture practice on a tool-based approach to measuring and processing IT debt, known as "Scan & Plan", and supported by a "Business Capability Portfolio Management" industrial tool.
Without going into the nuts and bolts of the process, here are some of the founding principles:
- 360° assessment: the origin(s) of the IT debt must be identified and quantified, among all the possible areas (refer to my previous paragraph). The current tools on the market focus mainly on measuring code quality and vulnerabilities. This is far from sufficient since it does not cover other factors which are often much more harmful (technical architecture, data, skills, enterprise architecture, etc.);
- business orientation: the importance of an IT debt is weighed against its "degree of nuisance", the risk of doing nothing. And here, it is the business that has the last word. Who cares about IT debt on a component that provides a minor business service, that never evolves, and that has a stable usage cycle? Hence the importance of also quantifying the business value of the services affected by IT debt (e.g. criticality, number of users, impact on the business, frequency of evolution, growth forecast, etc.);
- communication and pedagogy: last but not least, convincing top management of the priorities for dealing with IT debt, which are often linked to technical issues that they do not understand. This relies on a great deal of communication, transparency and highlighting of the problems in relation to the potential consequences for the business (longer time to market, degraded customer experience, loss of turnover, inability to support growth forecasts, etc.).
The grail of the " zero debt " strategy
The "Scan & Plan" approach is proving to be very effective in developing, validating and launching remediation plans. It's not bad, let's say it allows you to reach the "green belt" (to use Lean Six Sigma terminology).
But the grail, the "black belt", is to put in place a real strategy for the continuous treatment of IT debt to aim for "zero debt".
We can look at what the leaders are doing in this area, for example Google's systemic approach, the "Site Reliability Engine", which involves a strong commitment from top management. It is based on the definition of SLOs ("Service Level Objectives"), a sort of moral contract between the business and IT stakeholders, and the systemic measurement of these SLOs. Any deviation is charged to the business units in an "error budget". And as soon as the latter exceeds a certain threshold, a feature freeze is imposed to devote all efforts to dealing with the faulty debt. We hit where it exceeds.
Without aiming for the level of Google, we can see that the two pillars of the 'zero debt' strategy are governance and measurement. This implies at least for a company:
- to establish a strong governance of IT debt, combining central governance and a network of referents in each of the projects. This is essential to ensure smooth communication and decision-making, both top-down and bottom-up. All this only works if this governance has sponsorship at the highest level of the company;
- to provide, through this governance, a counter-power in technological choices with regard to relative business priorities (short term/long term balance);
- to include a regular bandwidth in the workload plans for processing IT debt. This is exactly what the SAFe approach allows by institutionalising regular sprints dedicated exclusively to technical improvements (the "enabler stories");
- and finally, I was about to forget the most obvious one: to carry out a continuous measurement of IT debt, with a methodology and tooling of the same level as what we propose with "Scan & Plan".
In addition, modern architecture patterns, most of which have emerged in companies with a true "zero debt" strategy, allow for simplified IT debt management. I'm talking about modular (microservices, containers, etc.), decoupled (APIs, events, etc.) and automated (cloud, observability, etc.) architectures which, in addition to reducing the risk of IT debt, finally allow it to be segmented into "small units" that are easier to identify and process as they arise.
Epilogue
To deal with this problem of IT debt, which is so large and complex, you will have understood that you need players capable of reconciling technical issues with business challenges.
And this is the art of Enterprise Architects, and their experience of the field undermined by IT debt. Indeed, the job of the Enterprise Architect has strongly evolved over the last few years to cover a complete spectrum from the business to the technology, to know how to combine top-down and bottom-up approaches and to build punchy points of view adapted to each type of stakeholder. This is finally the link that was missing to be able to deal with IT debt efficiently.