Monthly Archives: April 2012

OpenStack in Ubuntu Server 12.04 LTS

With the release of Ubuntu Server 12.04 LTS quickly approaching, the Ubuntu Server Team has been working extremely hard on ensuring OpenStack Essex will be of high quality and tightly integrated into Ubuntu Cloud.  As with prior Long Term Support releases, Canonical commits to maintaining Ubuntu Server 12.04 LTS for five years, which means users receive five years of maintenance for the OpenStack Essex packages we provide in main.   With that said, we recognize that OpenStack is still a relatively young project moving at a tremendous rate of innovation right now, with features and fixes already planned for Folsom that some users require for their production deployment.  In the past, these users would have to upgrade off the LTS, in order to get maintenance for the OpenStack release they need on Ubuntu Server… thus foregoing the five year maintenance they want and need for their production deployment.  We wholeheartedly believe there are situations where moving to the next release of Ubuntu (12.10, 13.04, etc) for newer OpenStack releases works just fine, especially for test/dev deployments.  However, we also know there will be many situations where users cannot afford the risk and/or the cost of upgrading their entire cloud infrastructure just to get the benefits of a newer OpenStack release, and we need to have a solution that fits their needs. After thinking about what users want and where most people expect OpenStack go in terms of continued innovation and stability, we have decided to provide Ubuntu users with two options for maintenance and support in the 12.04 LTS.

The first option is that users can stay with the shipped version of OpenStack (Essex) and remain with it for the full life of the LTS.  As per the Ubuntu LTS policy, we commit to maintaining and supporting the Essex release for 5 years.  The point releases will also ship the Essex version of OpenStack, along with any bug fixes or security updates made available since its release.

Introducing the Ubuntu Cloud Archive

The second option involves Canonical’s Ubuntu Cloud archive, which we are officially announcing today.  Users can elect to enable this archive, and install newer releases of OpenStack (and the dependencies) as they become available up through the next Ubuntu LTS release (presumably 14.04).  Bug processing and patch contributions will follow standard Ubuntu practice and policy where applicable.  Canonical commits to maintaining and supporting new OpenStack releases for Ubuntu Server 12.04 LTS in our Ubuntu Cloud archive for at least 18 months after they release.  Canonical will stop introducing new releases of OpenStack for Ubuntu Server 12.04 LTS into the Ubuntu Cloud archive with the version shipped in the next Ubuntu Server LTS release (presumably 14.04).  We will maintain and support this last updated release of OpenStack in the Ubuntu Cloud archive for 3 years, i.e. until the end of the Ubuntu 12.04 LTS lifecycle.
In order to allow for a relatively easy upgrades, and still adhere to Ubuntu processes and policy, we have elected to have archive.canonical.com be the home of the Ubuntu Cloud archive.  We will enable update paths for each OpenStack release.

  • e.g. Enabling “precise-folsom” in the archive will provide access to all OpenStack Folsom packages built for Ubuntu Server 12.04 LTS (binary and source), any updated dependencies required, and bug/security fixes made after release.

As of now, we have no plans to build or host OpenStack packages for non-LTS releases of Ubuntu Server in the Ubuntu Cloud archive.  We have created the chart below to help better explain the options.

Q&A

Why Not Use Stable Release Updates?

Ubuntu’s release policy states that once an Ubuntu release has been published, updates must follow a special procedure called a stable release update, or SRU, and are delivered via the -updates archive.  These updates are restricted to a specific set of characteristics:

  • severe regression bugs
  • security vulnerabilities (via the -security archive)
  • bugs causing loss of user data
  • “safe” application layer bugs
  • hardware enablement
  • partner archive updates

Exceptions to the SRU policy are possible. However, for this to occur the Ubuntu Technical Board must approve the exception, which must meet their guidelines:

  1. Updates to new upstream versions of packages must be forced or substantially impelled by changes in the external environment, i.e. changes must be outside anything that could reasonably be encapsulated in a stable release of Ubuntu. Changes internal to the operating system we ship (i.e. the Ubuntu archive), or simple bugs or new features, would not normally qualify.
  2. A new upstream version must be the best way to solve the problem.  For example, if a new upstream version includes a small protocol compatibility fix and a large set of user interface changes, then, without any judgement required as to the benefits of the user interface changes, we will normally prefer to backport the protocol compatibility fix to the version currently in Ubuntu.
  3. The upstream developers must be willing to work with Ubuntu.  A responsive upstream who understands Ubuntu’s requirements and is willing to work within them can make things very much easier for us.
  4. The upstream code must be well-tested (in terms of unit and system tests).  It must also be straightforward to run those tests on the actual packages proposed for deployment to Ubuntu users.
  5. Where possible, the package must have minimal interaction with other packages in Ubuntu.  Ensuring that there are no regressions in a library package that requires changes in several of its reverse-dependencies, for example, is significantly harder than ensuring that there are no regressions in a package with a straightforward standalone interface that can simply be tested in isolation. We would not normally accept the former, but might  consider the latter.

Once approved by the Tech Board, the exception must have a documented update policy, e.g. http://wiki.ubuntu.com/LandscapeUpdates.  Based on these guidelines and the core functionality OpenStack serves in Ubuntu Cloud, the Ubuntu Server team did not feel it was in the best interest of their users, nor Ubuntu in general, to pursue an SRU exception.

What about using Ubuntu Backports?

The Ubuntu Backports process (excludes kernel) provides us a mechanism for releasing package updates for stable releases that provide new features or functionality.  Changes were recently made to `apt` in Ubuntu 11.10, whereby it now only installs packages from Backports when they are explicitly requested.  Prior to 11.10, `apt` would install everything from Backports once it was enabled, which led to packages being unintentionally upgraded to newer versions.  The primary drawbacks with using the Backports archive is that the Ubuntu Security team does not provide updates for the archive, it’s a bit of a hassle to enable per package updates, and Canonical doesn’t traditionally offer support services for the packages hosted there.  Furthermore, with each new release of OpenStack, there are other applications that OpenStack depends on that also must be at certain levels.  By having more than one version of OpenStack in the same Backports archive, we run a huge risk of having backward compatibility issues with these dependencies.

How Will You Ensure Stability and Quality?

In order for us to ensure users have a safe and reliable upgrade path, we will establish a QA policy where all new versions and updated dependencies are required to pass a specific set of regression tests with a 100% success rate.  In addition:

  • Unit testing must cover a minimum set of functionality and APIs
  • System test scenarios must be executed for 24, 48 and 72 hours uninterrupted.
  • Package testing must cover: initial installation, upgrades from the previous OpenStack release, and upgrades from the previous LTS and non-LTS Ubuntu release.
  • All test failures must be documented as bugs in Launchpad, with regressions marked Fix Released before the packages are allowed to exit QA.
  • Test results are posted publicly and announced via a mailing list specifically created for this effort only.

Only upon successfully exiting QA will packages be pushed into the Ubuntu Cloud archive.

What Happens With OpenStack Support and Maintenance in 14.04?

Good question.  The cycle could repeat itself, however at this point Canonical is not making such a commitment.  If the rate of innovation and growth of the OpenStack project matures to a point where users become less likely to need the next release for its improved stability and/or quality, and instead just want it for a new feature, then we would likely return to our traditional LTS maintenance and support model.

Ubuntu Server is No Longer the Best OS for Cloud Computing.

Okay, so now that I got your attention….let me explain.

Over this past year and a half (maybe a little longer), I’ve seen Ubuntu Server explode in number and types of deployments, specifically around areas involving cloud computing, but also in situations involving big data and ARM server deployments.  This has all occurred at a time when people and organizations are having to do more with less…less lab space…less power…less people, which of course all leads to the real desire of operating at less financial cost.  I’ve come to the conclusion that me saying we should focus Ubuntu Server on being the best OS for cloud computing at the 11.10 UDS was aiming too low.  It’s awesome that we’ve essentially done this with our OpenStack integration efforts for Ubuntu Cloud, but we can do more…we can do better.  I now believe that for 12.04LTS and beyond, what Ubuntu Server should actually drive towards is being the best OS for scale-out computing.   

Scale-Out is Better than Scale-Up

Scale-out computing is the next evolutionary step in enterprise server computing. It used to be that if you needed an enterprise worthy server you had to buy a machine with a bunch of memory, high-end CPU configuration, and a lot of fast storage. You also needed to plan ahead to ensure what you purchased had enough open CPU and memory slots, as well as drive bays, to make sure you could upgrade when demand required it.  When the capacity limit (cpu, memory, and/or storage) of this server was hit, you had to replace it with a newer, often more expensive one, again planning for upgrades down the road.  Finally, to ensure high availability, you had to have one or two more of these servers with the same configuration.  Companies like Google, Amazon, and Facebook then came along and recognized that they could use low-cost, commodity hardware to build “pizza box” servers to do the same job, instead of relying on expensive, mainframe-like servers that needed costly redundancy built into every deployment.  These organizations realized that they could rely on a lot of cheap, easy-to-find (and replace) servers to effectively do the job a few scaled-up, high-end (and cost) servers could tackle.  More work could be accomplished, with a reduced risk of failure by exploiting the advantages a scale-out solution provided. If a machine were to die in a comparable scale­-up configuration, it would be very costly in both time and money to repair or replace it.  The scale-out approach allowed them to use only what they needed and quickly/easily replace systems when they went down.

Fast forward to today, and we have an explosion of service and infrastructure applications, like Hadoop, Ceph, and OpenStack, architected and built for scale-out deployments.  We even have the Open Compute Project focused on designing servers, racks, and even datacenters to specifically meet the needs of scale-out computing.  It’s clear that scale-out computing is overtaking scale-up as the preferred approach to most of today’s computational challenges.

With Great Scale, Comes Great Management Complexity

It’s not all rainbows and unicorns though…scale-out comes with it’s own inherent problems.  There’s a great paper published by IBM Research called, Scale-up x Scale-out: A Case Study using Nutch/Lucenewhere the researchers set out to measure and compare the performance of a scale-up versus scale-out approach to running a combined Nutch/Lucene workload.  Nutch/Lucene is an opensource framework written in Java for implementing search applications consisting of three major components: crawling, indexing, and query.  Their results indicated that “scale-out solutions have an indisputable performance and price/performance advantage over scale-up”, and that “even within a scale-up system, it was more effective to adopt a “scale-out-in-a-box” approach than a pure scale-up to utilize its processors efficiently”, i.e use virtualization technologies like KVM.  However, they also go on to conclude that

scale-out systems are still in a significant disadvantage with respect to scale-up when it comes to systems management. Using the traditional concept of management cost being proportional to the number of images, it is clear that a scale-out solution will have a higher management cost than a scale-up one.”

These disadvantages are precisely what I see Ubuntu Server attempting to account for over the next few years.  I believe that in Ubuntu Server 12.04LTS, we have already started to address these issues in several specific ways.

Power Consumption

One obvious issue with scale-out computing is the need for space to store your servers and provide enough power to run/cool them.  We haven’t figured out how to shrink the size of your server through code, so we can’t help with the space constraints.  However, we have started to develop solutions that can help administrators use less power to run their deployments. For example, we created PowerNap, which is a configurable daemon that can bring a running server to a lower power state according to a set of configuration preferences and triggers.

As a company, Canonical also began investing in supporting processor technologies that focused on delivering a high rate of operations at low-power consumption rates.  ARM has a long-standing history of providing processors that use very little power. The potential for server applications, meant you could drive server processor density up and still keep power consumption relatively low. With this greater density, server manufacturers started to see opportunities for building very high speed interconnects that allow these processors to share data and cooperate quickly and easily. ARM server technology companies such as Calxeda can now build computing grids that won’t require watercooling and an in-house backup generator running when you turn them on.  With the Cortex-A9 and Cortex-A15 processors in particular, the performance differential between ARM processors and x86 is starting to shrink significantly.  We are getting closer to having full 64-bit support in the coming ARMv8 processors, that will still retain the low power and low cost heritage of the ARM processor.  Enterprise server manufacturers are already planning to start putting ARM processors into very low-cost, very dense, and very robust systems to provide the kind of functionality, interconnectivity and compute power that used to only be possible in mainframes.  Ubuntu Server 12.04 LTS will support ARM, specifically the hard float compilation configuration (armhf).  With our pre-releases already receiving such good performance reviews, we are excited about the possibilities.  If you want to know more about what we’ve done with ARM for Ubuntu Server, I recommend you start with a great FAQ posted on our wiki.

Support Pricing

Traditional license and subscription support models are built for scale-up solutions, not scale-out.  These offerings either price by number of users or number of cores per machine, which are within reason when deploying onto a small number of machines, i.e. under 100…maybe a bit higher depending on the size of the organization.  The base price gets you access to security updates and bug fixes, and you have to pay more to get more, i.e. someone on the phone, email support, custom fixes, etc.  This is still acceptable to most users in a scale-up model.

However, when the solution is scale-out, i.e. 1000s or more, this pricing gets way out of control.  Many of the license and subscription vendors have recently wised up to this, and offer cluster-based pricing, which isn’t necessarily cheap, but certainly much less costly than the per socket/CPU/user approach.  The idea is that you pay for the master or head node, and then can add as many slave nodes as you want for free.

Ubuntu Server provides security updates and maintenance for the life of the release…for free.  That means for an LTS release of Ubuntu Server, users get five years of free maintenance, if you need someone to call or custom solutions, you can pay Canonical for that…but if you don’t…you pay nothing.  It doesn’t matter if you have a few machines or over a 1000, security updates and maintenance for the set of supported packages shipped in Ubuntu is free.

Services Management

Deploying interconnected services across a scale-out deployment is a PITA. After procuring the necessary hardware and finding lab space, you have to physically set them up, install the OS and required applications, and then configure and connect the various applications on each machine to provide the right desired services. Once you’ve deployed the entire solution, upgrading or replacing the service applications, modifying the connections between them, scaling out to account for higher load, and/or writing custom scripts for re-deployment elsewhere requires even more time…and pain.

Juju is our answer to this problem.  It focuses on managing the services you need to deliver a single solution, above simply configuring the machines or cloud instances needed to run them.  It was specifically designed, and built from the ground up, for service orchestration. Through the use of charms, Juju provides you with shareable, re-usable, and repeatable expressions of DevOps best practices. You can use them unmodified, or easily change and connect them to fit your needs. Deploying a charm is similar to installing a package on Ubuntu: ask for it and it’s there, remove it and it’s completely gone.  We’ve dramatically improved Juju for Ubuntu Server 12.04LTS, from integrating our charm collection into the client (removing the need for bzr branches) to having rolled out a load of new charms for all the services you need…and probably some you didn’t know you wanted.  As my good friend Jorge Castro says, the Juju Charm Store Will Change the Way You Use Ubuntu Server.

Deployment Tools

In terms of deployment, we recognized this hole in our offering last cycle and rolled out Orchestra as first step, to see what the uptake would be.  Orchestra wasn’t an actual tool or product, but a meta-package pointing to existing technologies like cobbler, already in our archive.  We simply ensured the tools we recommended worked, so that in 11.10 you can deploy Ubuntu Server across a cluster of machines easily.

After 11.10 released, we realized we could extend the idea from simple, multi-node OS install and deployment, to a more complex offering of multi-node service install and deployment.  This effort would require us to do more than just integrate existing projects, so we decided to create our own project called MAAS (metal as a service), which would be tied into Juju, our service orchestration tool.

Ubuntu 12.04 LTS will include Canonical’s MAAS solution, making it trivial to deploy services such as OpenStack, Hadoop, and Cloud Foundry on your servers. Nodes can be allocated directly to a managed service, or simply have Ubuntu installed for manual configuration and setup.  MAAS lets you treat farms of servers as a malleable resource for allocation to specific problems, and re-allocation on a dynamic basis.  Using a pretty slick user interface, administrators can connect, commission and deploy physical servers in record time, re-allocate nodes between services dynamically, and keep them all up to date and in due course.

We’ve Come a Long Way, But

There’s a lot more we need to do.  What if the MAAS commissioning process included hardware configuration, for example RAID setup and firmware updates?  What if you could deploy and orchestrate your services by mouse click or touch…never touching a keyboard?  What if your services were allocated to machines based on power footprint?  What if your bare metal deployment could also be aware of the Canonical hardware certification database for systems and components, allowing you to quickly identify systems that are fully certified or might have potentially problematic components?  What if your services auto-scaled based on load without you having to be involved?  What if you could have a true hybrid cloud solution, bursting up to a public cloud(s) of your choosing without ever having to rewrite or rearchitect your services?  These types of questions are just some of the challenges we look to take on over the next few releases, and if any of it interests you…I encourage you to please join us.