Building a portable, scalable, reusable Deployment Pipeline for an arbitrarily complex environment (Part 1)

2015-12-18 - Reading time: 6 minutes #pipeline

The first of three posts about building an advanced deployment pipeline.

Back in May I did a presentation about “The search for the Holy Grail” at a DevOps meetup in Stockholm. The alternative name for the presentation could have been the title in this blog post, but that wouldn’t have allowed me to make gratuitous monty python references.

Lets sort through the buzzwords and really clarifiy exactly what we’re talking about:

  • Portable: meaning portable between cloud providers (AWS and Elastx’s Openstack platform for example)
  • Scalable: meaning that adding more people/teams/components to the system should have no worse than a linear relationship between the mean time to commit/push and deploy to production
  • Reusable: meaning the software architectue to build the Deployment Pipeline should be achieved in such a way to make code generic and adaptable to other systems with minimal modification.

It’s a fair question to ask is it really possible or even worthwhile to build such a Deployment Pipeline?

Obviously my answer to that is a resounding, “yes”! And the explanation is simple, really. Since I’ve been working with Continuous Delivery and DevOps for almost 4 years, the basic problems that need solving are essentially the same every time in the following order of importance:

  1. Fix branching and merging so there is clear and simple model with a clean master and features branches with only a single path to merge changes to master
  2. Setup CI and code reviews as a gateway through which all branches must pass before merging to master
  3. Introduce tools for configuration management and automated deployment as a means to reproduce the system
  4. Continue to hammer points 1, 2, and 3 into everyone’s heads while improving automated testing and configuration management until you can rebuild the system from scratch

So if it’s basically the same thing every time then why not simplify the whole process and build a pipeline with interchangable parts that will save you a whole load of work down the line?

That’s what I thought.

This is going to be a tools and workflow-focussed post, so I want begin with an overview of the tools available to us to build our pipeline. At the end of the article I’ll explain about how to look out for some of the anti-patterns I’ve encountered concerning these types of discussions.

Probably one of the most complex things in the DevOps space is keeping track of the explosion of tooling in the last few years. An interesting overview of this is portrayed in Xebialabs periodic table of devops tools.

Elements in a periodic table, where everything sits in a neat little box, paints an overly simplistic picture however, because the reality is that tools in the DevOps landscape requires a venn-diagram something more like this…

Unfortunately this diagram is only for illustrative purposes. I do think that it would be a worthy exercise to invest the time to properly classify the categories and the overlap between them (at the very least you’d probably be on the front page of hacker news for a couple of days).

But that will have to wait for another day.

Let’s go through what I think are the critical elements of Deployment Pipeline:

  1. Source Control (including code review)
  2. Orchestration (of cloud resources and service discovery)
  3. Configuration Management (for consistent and reproducible environments)
  4. Continuous Integration (including automated testing)
  5. Artifact repository (for storing and fetching build artifacts)
  6. Workflow Visualisation (a dashboard where “done” means released to production)
  7. Monitoring and metrics (to ensure both speed and quality are improving)

That is a big list of stuff to cover so we’re going to have to sharpen our knife and fork before we eat this elephant.

Before we get into specific tooling, I want to take a moment to note how others are tackling this problem. There are many degrees of freedom so in case you don’t find my approach useful, hopefully one of these other methods might be more suitable for your use case.

(I have no affiliations to any of the companies or services listed below.)

  1. The first project requiring mention is software-factory. It is based on very sophisticated tooling used in the Openstack project where a lot of my ideas and inspiration come from (Elastx runs Openstack). Software Factory gets full marks for scalability, however it is not portable because it’s tied to Openstack. It is packaged in such a way to make it reusable, but the tools are tightly integrated, so if you want to swap one of them for something in your existing toolchain, you’re going to have some work to do.
  2. Codeship have created a company around supplying “Deployment Pipelines as a Service”, which I think is a pretty good business model - it clearly demonstrates that there is a lot of similarities between one deployment pipeline and another. They also have a great technical blog with lots of good tips focussed on CI/CD and automation. They definitely earns points for flexibility and reusability as you can , but they are not very portable because you’re tied to AWS (or Heroku, which also runs on AWS).
  3. Distelli are a competitor to Codeship with another “DPaaS” that appear more flexible in your options around endpoint portability. They look like they have a fairly flexible “plug and play” architecture, but I don’t know how scalable their solution is for building complex environments.
  4. I always find myself especially attracted to Travis-ci and Github because of their sweet integration with lots of flexibility and it’s free for open-source projects. However if you have a complex application you’re going to run into problems.

That’s just a few alternative approaches to solving this problem and one of those might be more suitable depending on your situation.

If your business is primarily focussed on a CMS to run a website and you don’t really have to worry about more complexity than scaling and pushing content to a caching layer or CDN, then your usecase may fit better into one of the options listed above. That is unless you’re doing something really crazy like orchestrating Docker containers with Kubernetes on Apache Mesos and running some Big Data business analytics on your Hadoop-Elasticsearch backend. And if that’s your usecase then that is totally badass, but it would be pretty weird if you didn’t already have a deployment pipeline working pretty smoothly if you’re at that scale already.

Another way to phrase the above, is if you’re building your services on a PaaS and you’re living in a happy place then there’s probably no reason to make trouble for yourself. However if you have to worry about IaaS, network segmentation, patching operating systems, a mixed environment with fixed hardware and virtualised services, data protection or legal compliance like PCI-DSS, EFPIA, or HIPAA, then you should continue reading.

In Part 2 we’ll get into tool selection with pros and cons…