Blogs

Implement a Simple Stability Plan to Produce Reliable Software

Challenges

Delivering reliable software amidst growing complexity in modern applications
Identifying and mitigating issues that cause instability in production systems
Balancing feature development with efforts to maintain system stability

Solutions

Implementing a straightforward stability plan to ensure consistent reliability
Using monitoring and diagnostics tools to detect and address issues early
Establishing best practices for proactive maintenance and error prevention

Benefits

Increased confidence in delivering stable and reliable software to users
Reduced downtime and faster resolution of critical issues
Enhanced team productivity by minimizing disruptions caused by instability

The second pillar of the Clear Measure Way is Achieving a Stability Plan. If you missed it, also check out the first pillar, Establishing Quality, which is super important. After ensuring the code is bug-free, the team can work on getting it to a stable production environment. This is the next stepping stone towards increasing your software delivery speed.

Most likely you have not only one new piece of software, but many existing apps in production. Several of these may have stability issues from time to time, and stability issues can have one or more of these common symptoms:

Sluggishness
Outages
Offline error messages
Frozen screens
Abnormal behavior
Bugs

Sometimes your engineers cannot reproduce them, and they only occur during production.

When the users report any of these symptoms, you have a production issue. Having good “language” around the symptoms gives you clarity in your oversight duties. Remember, you’re the software executive; you oversee all of this. And you want to make sure appropriate stability measures are in place to track the stability.

If the software is not stable, it’s not going to do its job. You need to ensure the stability of your software as it runs in production.

Why You Need a Software Stability Plan

Sometimes teams can be a little gun shy of production deployments. They might advocate for monthly deployments, or maybe after-hours deployment events with all-hands-on-deck. This is technically not required and is usually caused by an unpleasant experience during live updates.

They’re gun shy because they’ve been shot before. After a deployment goes bad, developers can really become hesitant. They can become weary and distrustful of the process because they consider it dangerous.

You need a stability plan because having a lot of undeployed software is costly. This isn’t generating any return for the business. Plus, it yields a growing risk of unproven software changes.

Now, think about other departments; all departments that manage throughput understand the power of limiting work in process (WIP).

Infrequent deployments queue up way too many changes that are waiting for a big, stressful, error-prone deployment event.

Ultimately, your two goals to achieve stability are (1) prevent production issues, and (2) minimize undeployed software. In other words, get the software out there. The software needs to yield the business a return on the investment you’ve made in it.

Measure production issues and software deployments on the team scorecard during weekly metric reporting. For example:

Number of deployments for the week
Number of production issues for the week divided by severity
MTTR: mean time to recovery (average time to resolve the issue

Stability Questions to Ask Your Team

You can ask your team questions to drive the right behavior when seeking to achieve your stability plan. For example:

What features have been changed, tested, and ready for production?
Have you considered some way to prevent this type of issue from happening again?
What was the root cause of that production issue?
What should we strengthen about our environment, so that we’re able to resolve issues faster next time?

Minimum Standards for Achieving Your Stability Plan

Just like with quality, there’s a minimum set of practices every team should employ, if you expect to run a stable software system in a production environment.

First, automated DevOps from day one of a new project. The goal of that is to eliminate manual monthly deployments. You want to have automated deployments all the way from code, builds, test, and release candidate deployment.

Second, you want to have small releases. You want to deploy when something is ready no matter what time of day.

Third, you want runtime automated health checks. This is like your car’s built-in self-diagnostic tool. It can notify you when there is an issue. Then a technician can plug in a diagnostic tool and receive the error codes. Likewise, you can have a built-in health check-in to have explicit secrets management because security is such an ongoing risk.

That’s just the minimum. There’s going to be other practices to prevent issues. When production issues arise (and you will from time to time), the following practices will help the team diagnose and solve them faster.

Stability Plan Practices

First, centralized open telemetry logging, and metrics and traces. If you don’t have centralized telemetry, look into the great standard open telemetry that all the tool vendors are using.

Second, APM, which stands for Application Performance Management, is a tool with a shared Operations Dashboard.

Third, a formal support desk tool with ticket tracking, anomaly alerts, and emergency alarms.

If some of this sounded familiar, it’s because many of them are the software parallel of practices to operate any other factory or assembly line in a factory. When a production line has an issue, the staff quickly fixes it to prevent a factory shutdown. And for more serious problems, emergency alarms sound, stopping the line to call everyone’s attention. While the tools are different, the way of thinking is the same.

Implement a Simple Stability Plan to Produce Reliable Software

Challenges

Solutions

Benefits

Why You Need a Software Stability Plan

Stability Questions to Ask Your Team

Minimum Standards for Achieving Your Stability Plan

Stability Plan Practices

More Stability Questions to Ask Your Team

The Five Pillars

Content Types

Categories

Services

Technologies

Resource Center

The Five PIllars

Training Center

About Us