Canonical’s Office of The CDO: A 5 Year Journey in DevOps
I’m often asked what being the Vice President of Cloud Development and Operations means, when introduced for a talk or meeting, or when someone happens to run by my LinkedIn profile or business card.
The office of the CDO has been around in Canonical for so long, I forget that the approach we’ve taken to IT and development is either foreign or relatively new to a lot of IT organizations, especially in the commonly thought of “enterprise” space. I was reminded of this when I gave a presentation at an OpenStack Developer Summit entitled “OpenStack in Production: The Good, the Bad, & the Ugly” a year ago in Portland, Oregon. Many in the audience were surprised by the fact that Canonical not only uses OpenStack in production, but uses our own tools, Juju and MAAS, created to manage these cloud deployments. Furthermore, some attendees were floored by how our IT and engineering teams actually worked well together to leverage these deployments in our production deployment of globally accessible and extensively used services.
Before going into what the CDO is today, I want to briefly cover how it came to be. The story of the CDO goes back to 2009, when our CEO, Jane Silber, and Founder, Mark Shuttleworth, were trying to figure out how our IT operations team and web services teams could work better…smarter together. At the same time our engineering teams had been experimenting with cloud technologies for about a year, going so far as to provide the ability to deploy a private cloud in our 9.04 release of Ubuntu Server.
It was clear to us then, that cloud computing would revolutionize the way in which IT departments and developers interact and deploy solutions, and if we were going to be serious players in this new ecosystem, we’d need to understand it at the core. The first step to streamlining our development and operations activities was to merge our IT team, who provided all global IT services to both Canonical and the Ubuntu community, with our Launchpad team, who developed, maintained, and serviced Launchpad.net, the core infrastructure for hosting and building Ubuntu. We then added our Online Services team, who drove our Ubuntu One related services, and this new organization was called Core DevOps…thus the CDO was born.
Roughly soon after the formation of the CDO, I was transitioning between roles within Canonical, going from acting CTO to Release Manager (10.10 on 10.10.10..perfection! 🙂 ), then landing in as our new manager for the Ubuntu Server and Security teams. Our server engineering efforts continued to become more and more focused on cloud, and we had also began working on a small, yet potentially revolutionary, internal project called Ensemble, which was focused on solving the operational challenges system administrators, solution architects, and developers would face in the cloud, when one went from managing 100s of machines and associated services to 1000s.
All of this led to a pivotal engineering meeting in Cape Town, South Africa early 2011, where management and technical leaders representing all parts of the CDO and Ubuntu Server engineering met with Mark Shuttleworth, along with the small team working on Project Ensemble, to determine the direction Canonical would take with our server product.
Until this moment in time, while we had been dabbling in cloud computing technologies with projects like our own cloud-init and the Amazon EC2 AMI Locator, Ubuntu Server was still playing second stage to Ubuntu for the desktop. While being derived from Debian (the world’s most widely deployed and dependable Linux web hosting server OS), certainly gave us credibility as a server OS, the truth was that most people thought of desktops when you mentioned Ubuntu the OS. Canonical’s engineering investments were still primarily client focused, and Ubuntu Server was nothing much more than new Debian releases at a predictable cadence, with a bit of cloud technology thrown in to test the waters. But this weeklong engineering sprint was where it all changed. After hours and hours of technical debates, presentations, demonstrations, and meetings, there were two major decisions made that week that would catapult Canonical and Ubuntu Server to the forefront of cloud computing as an operating system.
The first decision made was OpenStack was the way forward. The project was still in its early days, but it had already peaked many of our engineers’ interest, not only because it was being led by friends of Ubuntu and former colleagues of Canonical, Rick Clark, Thierry Carrez, and Soren Hansen, but the development methods, project organization, and community were derived from Ubuntu, and thus it was something we knew had potential to grow and sustain itself as an opensource project. While we still had to do our due diligence on the code, and discuss the decision at UDS, it was clear to many then that we’d inevitably go that direction.
The second decision made was that Project Ensemble would be our main technical contribution to cloud computing, and more importantly, the key differentiator we needed to break through as the operating system for the cloud. While many in our industry were still focused on scale-up, legacy enterprise computing and the associated tools and technologies for things like configuration and virtual machine management, we knew orchestrating services and managing the cloud were the challenges cloud adopters would need help with going forward. Project Ensemble was going to be our answer.
Fast forward a year to early 2012. Project Ensemble had been publicly unveiled as, Juju, the Ubuntu Server team had fully adopted OpenStack and plans for the hugely popular Ubuntu Cloud Archive were in the works, and my role had expanded to Director of Ubuntu Server, covering the engineering activities of multiple teams working on Ubuntu Server, OpenStack, and Juju. The CDO was still covering IT operations, Launchpad, and Online Services, but now we had started discussing plans to transition our own internal IT infrastructure over to an internal cloud computing model, essentially using the very same technologies we expected our users, and Canonical customers, to depend on. As part of the conversation on deploying cloud internally, our Ubuntu Server engineering teams started looking at tools to adopt that would provide our internal IT teams and the wider Ubuntu community the ability to deploy and manage large numbers of machines installed with Ubuntu Server. Originally, we landed on creating a tool based on Fedora’s Cobbler project, combined with Puppet scripts, and called it Ubuntu Orchestra. It was perfect for doing large-scale, coordinated installations of the OS and software, such as OpenStack, however it quickly became clear that doing this install was just the beginning…and unfortunately, the easy part. Managing and scaling the deployment was the hard part. While we had called it Orchestra, it wasn’t able to orchestrate much beyond machine and application install. Intelligently and automatically controlling the interconnected services of OpenStack or Hadoop in a way that allowed for growth and adaptability was the challenge. Furthermore, the ways in which you had to describe the deployments were restricted to Puppet and it’s descriptive scripting language and approach to configuration management…what about users wanting Chef?…or CF Engine?…or the next foobar configuration management tool to come about? If we only had a tool for orchestrating services that ran on bare metal, we’d be golden….and thus Metal as a Service (MAAS) was born.
MAAS was created for the sole purpose of providing Juju a way to orchestrate physical machines the same way Juju managed instances in the cloud. The easiest way to do this, was to create something that gave cloud deployment architects the tools needed to manage pools of servers like the cloud. Once we began this project, we quickly realized that it was good enough to even stand on its own, i.e. as a management tool for hardware, and so we expanded it to a full fledged project. MAAS expanded to having a verbose API and user-tested GUI, thereby making Juju, Ubuntu Server deployment, and Canonical’s Landscape product leverage the same tool for managing hardware…allowing all three to benefit from the learnings and experiences of having a shared codebase.
The CDO Evolves
In the middle of 2012, the current VP of CDO decided to seek new opportunities elsewhere. Senior management took this opportunity to look at the current organizational structure of Core DevOps, and adjust/adapt according to both what we had learned over the past 3 1/2 years and where we saw the evolution of IT and the server/cloud development heading. The decision was made to focus the CDO more on cloud/scale-out server technologies and aspects, thus the Online Services team was moved over to a more client focused engineering unit. This left Launchpad and internal IT in the CDO, however the decision was also made to move all server and cloud related project engineering teams and activities into the organization. The reasoning was pretty straight-forward, put all of server dev and ops into the same team to eliminate “us vs them” siloed conversations…streamline the feedback loop between engineering and internal users to accelerate both code quality and internal adoption. I took a career growth decision to apply for the chance to lead the CDO, and was fortunate enough to get it, and thus became the new Vice President of Core DevOps.
My first decision as new lead of the CDO was to change the name. It might seem trivial, but while I felt it was key to keep to our roots in DevOps, the name Core DevOps no longer applied to our organization because of the addition of so much more server and cloud/scale-out computing focused engineering. We had also decided to scale back internal feature development on Launchpad, focusing more on maintenance and reviewing/accepting outside contributions. Out of a pure desire to reduce the overhead that department name changes usually cause in a company, I decided to keep the acronym and go with Cloud and DevOps at first. However, then the name (and quite honestly the job title itself) seemed a little too vague…I mean what does VP of Cloud or VP of DevOps really mean? I felt like it would have been analogous to being the VP of Internet and Agile Development…heavy on buzzword and light on actual meaning. So I made a minor tweak to “Cloud Development and Operations“, and while arguably still abstract, it at least covered everything we did within the organization at high level.
At the end of 2012, we internally gathered representation of every team in the “new and improved” CDO for a week long strategy session on how we’d take advantage of the reorganization. We reviewed team layouts, workflows, interactions, tooling, processes, development models, and even which teams individuals were on. Our goal was to ensure we didn’t duplicate effort unnecessarily, share best practices, eliminate unnecessary processes, break down communication silos, and generally come together as one true team. The outcome resulted in some teams broken apart, some others newly formed, processes adapted, missions changed, and some people lost because they didn’t feel like they fit anymore.
Entering into 2013, the goal was to simply get work done:
- Work to deploy, expand, and transition developers and production-level services to our internal OpenStack clouds: CanoniStack and ProdStack.
- Work to make MAAS and Juju more functional, reliable, and scalable.
- Work to make Ubuntu Server better suited for OpenStack, more easily consumable in the public cloud, and faster to bring up for use in all scale-out focused hardware deployments
- Work to make Canonical’s Landscape product more relevant in the cloud space, while continuing to be true to its roots of server management.
All this work was in preparation for the 14.04 LTS release, i.e. the Trusty Tahr. Our feeling was (and still is) that this had to be the release when it all came together into a single integrated solution for use in *any* scale-out computing scenario…cloud…hyperscale…big data…high performance computing…etc. If a computing solution involved large numbers of computational machines (physical or virtual) and massively scalable workloads, we wanted Ubuntu Server to be the defacto OS of choice. By the end of last year, we had achieved a lot of the IT and engineering goals we set, and felt pretty good about ourselves. However, as a company we quickly discovered there was one thing we left out in our grand plan to better align and streamline our efforts around scale-out technologies….professional delivery and support of these technologies.
To be clear, Canonical had not forgotten about growing or developing our teams of engineers and architects responsible for delivering solutions and support to customers. We had just left them out of our “how can we do this better” thinking when aligning the CDO. We were initially focused on improving how we developed and deployed, and we were benefiting from the changes made. However, now as we began growing our scale-out computing customer base in hyperscale and cloud (both below and above), we began to see that same optimizations made between Dev and Ops, needed to be done with delivery. So in December of last year, we moved all hardware enablement and certification efforts for servers, along with technical support and cloud consultancy teams into the CDO. The goal was to strengthen the product feedback loop, remove more “us vs them” silos, and improve the response times to customer issues found in the field. We were basically becoming a global team of scale-out technology superheroes.
It’s been only 3 months since our server and cloud enablement and delivery/support teams have joined the CDO, and there are already signs of improvement in responsiveness to support issues and collaboration on technical design. I won’t lie and say it’s all been butterflies and roses, nor will I say we’re done and running like a smooth, well-oiled machine because you simply can’t do that in 3 months, but I know we’ll get there with time and focus.
So there you have it.
The Cloud Development and Operations organization in Canonical is now 5 years strong. We deliver global, 24×7 IT services to Canonical, our customers and Ubuntu community. We have engineering teams creating server, cloud, hyperscale, and scale-out software technologies and solutions to problems some have still yet to even consider. We deliver these technologies and provide customer support for Canonical across a wide range of products including Ubuntu Server and Cloud. This end-to-end integration of development, operations, and delivery is why Ubuntu Server 14.04 LTS, aka the Trusty Tahr, will be the most robust, technically innovative release of the Ubuntu for the server and cloud to date.