Reducing Risk with Feature Flips

It goes without saying, software developers write code like a fiend these days; especially since we developed methodologies to help streamline the entire process. And users realize that we are that good at implementing new features, the demand for continuous delivery has this never been so great before!

But, delivering updated software often into production systems has some drawbacks: it introduces lots and lots of risk.

There are, of course, methods and means to reduce risk. One is that the new build is tested thoroughly before being released. But that requires time… and your users, even the most cautious business users, do not like giving you too much of it.

Is detailed QA enough?

Have you ever heard a software team state: “Here is the new major build, the new database scripts, and we have done this three times already… nothing should go wrong… but just in case, please do a midnight deployment. And have all hands on standby.”

That should assure us that the team has tested the deployment and that they have provided as much backup as possible to make your next deployment succeed. But how does that sound when you have to ask these questions in a honest candid way?

  • What happens if the new feature negatively impacts the rest of the system after the point of no return?
  • What are we to do should the updated feature start failing in the most unexpected manner?
  • What is our plan B should the roll-out while deploying the new build encounter a problem never expected during those trial runs?
  • How long would it take us to revert the system to plan B if something went wrong?
  • What if the new or altered feature misses the business objectives? (Again, after the point of no return)

As much as aircraft manufacturers go through great lengths to ensure that they have put their aircraft through rigorous testing, and documenting all the procedures that should be followed in extreme edge cases, nothing substitutes anything better than fail-safe alternative systems.

Deployments without the risk

Rather than taking the approach of slating that legacy feature for a “new feature” with immediate effect, it would be best to keep the devil you know than the new angel you just created. That angel may not have strong wings as of yet.

As you initially earmark a new feature or altered behavior, introduce a “feature markdown” or “feature flip” switch for the respective type of feature development. Introduce this early, and deploy it to production as often as possible. Even if your builds do not have fully completed features, this should not matter: your existing system will always use the legacy devil that you have, as long as the corresponding switch is off.

After your feature is complete, and the ink from the rubber stamp of QA is still drying, you may simply flip the switch for your new feature to come online… realtime!

If you encounter problems? Simply just flip the switch to revert the behavior. Then figure out what the hell is going on, then attempt to bring your new feature back online when you have resolved the root cause.

The methodology of flipping switches when needed will ensure that the your application is scalable: adding and removing functionality on a whim’s notice. And you do not have to it at midnight? Cool, is it not?

Challenges to resolve?

The story of just flipping switches can be met with perceived challenges and natural back-pressure from fellow developers.

One of the issues that may be raised is the fact that database schemas need to be altered. Once they are altered, they become a peg in the ground as the point of no return. And it may be necessary to have these migration scripts in place for altered features.

I do not believe that it has to be that way: have a point of no return because of the database? However, that topic is something that I would like to dedicate another blog post to at a later stage.

blog comments powered by Disqus