Saturday, December 03, 2011

Will Amazon Support Linux Containers?

Early on, Amazon EC2 was recognized as the leading IaaS provider because of their ability to easily provision new virtual machines with a variety of configurations (size, speed, attachments, etc.) Virtual machines are a powerful, yet simple tool for engineers to use but they come at a price (a performance hit). At MomentumSI, we've been pondering if Amazon would ever support Linux Containers in their cloud. 


When asked, "Will Amazon Support Linux Containers?" Raj comments, "Would love it. We may see a type of instance which allows containers on it. You will have to take the whole machine and not just a container on it. That way AWS will not have to bother about maintaining the host OS. Given the complexities I think it will be a lower priority for Amazon and as it may be financially counterproductive; they may never do it."



Tom comments, "I doubt it. While I'm one of, if not *the*, biggest proponent of linux containers, the business reasoning still lags the technical reasoning. Intel, for instance, would *hate* such a move. Why? They spent a ton of money on virtualization at a chip level, which becomes a non-issue in containers (no hardware gets shared at the metal, rather, it's all one kernel for all containers). So, while it would be a great thing to see, the business market simply doesn't support this at this point, other than for folks like Pixar or other compute heavy folks.

What I *would* bet on is that AWS internally switches to some container based systems. For instance, ElasticMapReduce is far better off in a container world than in a VM world. Easier to maintain, direct access to 'cpu speed' and no need to virtualize access to disks -- it's all just there (even ISCSI ends up better in containers -- no 'vm to hypervisor' network translations)."


Amazon will likely be forced into one of three positions: 
1. Delivering sub-optimal platform performance on VM's (current state)
2. Supporting Linux Containers behind the scenes but not giving customer access to it. 
3. Delivering Linux Containers to customers and dealing with a whole new set of technical headaches. 


I'm more optimistic than my counterparts on the likelihood of #3. My reasons are simple: First, Amazon has done what they needed to do to satisfy customer needs.  Second, I think they'll need to do it to remain competitive with companies like Rackspace. As developers move from "needing a vm" to "needing a platform" (database, app server, etc.), Amazon will be pressed to expose a more highly performant layer to platform developers. One thing my associates and I agreed on is that we will not likely see containers in 2012... perhaps 2013?

Tuesday, November 22, 2011

Is Cloud Foundry a PaaS?

I've been asking some people in the industry a real simple question, "Is Cloud Foundry a Platform as a Service"?

The obvious answer would seem to be "yes" - after all, VMware told us it's a PaaS:

That should be the end of it, right? For some reason, when I hear "as-a-Service", I expect a "service" - as in Service Oriented. I don't think that's too much to ask. For example, when Amazon released their relational data service, they offered me a published service interface:
https://rds.amazonaws.com/doc/2010-07-28/AmazonRDSv4.wsdl

I know there are people who hate SOAP, WS-*, WSDL, etc. - that's cool, to each their own. If you prefer, use the RESTful API: http://docs.amazonwebservices.com/AmazonRDS/latest/APIReference/

Note that the service interface IS NOT the same as the interface of the underlying component (MySQL, Oracle, etc.), as those are exposed separately.

Back to my question - is Cloud Foundry a PaaS?

If so, can someone point me to the WSDL's, RESTful interfaces, etc?

Will those interfaces be submitted to DMTF, OASIS or another standards body?

Alternatively, is it merely a platform substrate that ties together multiple server-side technologies (similar to JBoss or WebSphere)?

Will cultural pushback kill private clouds?

Derick Harris asks the question, "Will cultural pushback kill private clouds?" His questioning comes from a piece provided by Lydia Leong where she notes that many enterprises have fat management structures and aren't organized like many of the leaner cloud providers.

I tend to agree with the premise that the enterprise will have difficulties in adopting private cloud but not for the reasons the authors noted. The IaaS & PaaS software is available. Vendors are now offering to manage your private cloud in an outsourced manner. More often than not, companies are educated on cloud and "get it". They have one group of people who create, extend and support the cloud(s). They have another group who use it to create business solutions. It's a simple consumer & provider relationship.

Traditionally, there are three ways things get done in Enteprise IT:
1. The CIO says "get'er done" (and writes a check)
2. A smart business/IT person uses program funds to sneak in a new technology (and shows success)
3. Geeks on the floor just go and do it.

With the number of downloads of open source stacks like OpenStack and Eucalyptus, it is apparent that model #3 is getting some traction. My gut tells me that the #2 guys are just pushing their stuff to the public cloud (will beg forgiveness - not asking for permission). On #1, many CIO's are hopeful that they can just 'extend their VMware' play - while more aggressive CIO's are looking to the next generation cloud vendors to provide something that matches the public cloud features in a more direct manner.

There are adoption issues in the enterprise. However, it's the same old reasons. Fat org-charts aren't going away and will not be the life or death of private cloud. In my opinion, we need the CIO's to make bold statements on switching to an internal/external cloud operating model. Transformation isn't easy. And telling the CIO that they need to fire a bunch of managers in order to look more like a cloud provider is silly advice and a complete non-starter.

Friday, August 12, 2011

Measuring Availability of Cloud Systems

The analysts at Saugatuck Technology recently wrote a note on "Cloud IT Failures Emphasize Need for Expectation Management". One comment caught my attention:

"Recall that the availability of a group of components is the product of all of the individual component availabilities. For example, the overall availability of 5 components, each with 99 percent availability, is: 0.99 X 0.99 X 0.99 X 0.99 X 0.99 = 95 percent."

I understand their math - but it strikes me odd that they would use this thinking when discussing cloud computing. In cloud environments, the components are often available as virtualized n+1 highly available pairs. If one is down, the other is taking over. In a non-cloud world, this architecture is typically only reserved for the most critical components (e.g., load balancers or other single-point-of-failures). It's also common to create a complete replica of the environment in a disaster recovery area (e.g., AWS availability zones). In theory, this leads to very high up-time.

Let me put this another way... I currently have 2 cars in my driveway. Let's say each of them has 99% up-time. If one car doesn't start, I'll try the other car. If neither car starts, I'll most likely walk over to my neighbors house and ask to borrow one of their two cars (my DR plan). You can picture the math... in the 1% chance that car A fails, theirs a 99% chance that car B will succeed, and so on. However, experience in both cars and in computing tells us that this math doesn't work either. For instance, if car A didn't start because it was 20 degrees below zero outside, there's a good chance that car B won't work start - and for that matter, my neighbors cars won't start either. Structural or natural problems tend to infect the mass.

I wish I could show you the new math for calculating availability in cloud systems - but it's beyond my pay grade. What I know is that the old math isn't accurate. Anyone have suggestions on a more modern approach?

Thursday, August 11, 2011

OpenShift: Is it really PaaS?

Redhat recently announced an upgraded version of OpenShift with exciting new features including support for Java EE6, Membase, MongoDB and more. See details at:

As I dug through the descriptions, I found myself with more questions than answers. When you say Membase or MongoDB are available as part of the PaaS, what does this really mean? For example:
  • They're pre-installed in clustered or replicated manner?
  • They're monitored out of the box?
  • Will it auto-scale based on the monitoring data and predefined thresholds? (both up and down?)
  • They have a data backup / restore facility as part of the as-a-service offering?
  • The backup / restore are as-a-service?
  • The backup / restore use a job scheduling system that's available as-a-service?
  • The backup / restore use an object storage system that has cross data center replication?
Ok, you get the idea. Let me be clear - I'm not suggesting that OpenShift does or doesn't do these things. Arguments can be made that it in some cases, it doesn't need to do them. My point is that several new "PaaS offerings" are coming to market and they smell like the same-ole-sh!t. If nothing else, the product marketing teams will need to do a better job of explaining what they currently have. Old architects need details.

It's no secret that I'm a fan of Amazon's approach of releasing their full API's (AWS Query, WSDL, Java & Ruby API's, etc.) along with some great documentation. They've built a layered architecture whereby the upper layers (PaaS) leverage lower layers (Automation & IaaS) to do things like monitoring, deployment & configuration of both the platforms and the infrastructure elements (block storage, virtual compute, etc.) The bar has been set for what makes something PaaS - and going forward, products will be measure based on this basis. It's ok if your offering doesn't do all they sophisticated things you find in AWS - but it's better to be up front about it. Old architects will understand.

Tuesday, April 26, 2011

Private Cloud Provisioning Templates

One of the primary benefits of a cloud computing environment is the increased automation. The Provisioning Service is perhaps the core mechanism to deliver this. To better understand the kinds of things we might orchestrate, take a look at the following template. You'll notice that it takes on the same format as Amazon's CloudFormation. This example launches a load balancer as part of our LB-aaS solution for a Eucalyptus cloud:

{
"ToughTemplateFormatVersion" : "2011-03-01",

"Description" : "Launch Load Balancer instance and install LB software.",

"Parameters" : {
"AvailabilityZone" : {
"Description" : "AvaialbilityZone in which an instance should be created",
"Type" : "String"
},
"AccountId" : {
"Description" : "Account Id",
"Type" : "String"
},
"LoadBalancerName" : {
"Description" : "Load Balancer Name",
"Type" : "String"
}
},

"Mappings" : {
"AvailabilityZoneMap" : {
"msicluster" : {
"SecurityGroups" : "default",
"ImageId" : "emi-FF070BFE",
"KeyName" : "rarora",
"EKI" : "eki-3A4A0D5A",
"ERI" : "eri-B2C7101A",
"InstanceType" : "c1.medium",
"UserData" : "80"
}
}
},

"Resources" : {
"LoadBalancerLaunchConfig": {
"Type": "TOUGH::LaunchConfiguration",
"Properties": {
"AccountId" : { "Ref" : "AccountId" },
"SecurityGroups" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "SecurityGroups" ]},
"ImageId" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "ImageId" ]},
"KeyName" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "KeyName" ]},
"InstanceType" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "InstanceType" ]},
"EKI" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "EKI" ]},
"ERI" : { "Fn::FindInMap" : [ "AvailabilityZoneMap", { "Ref" : "AvailabilityZone" }, "ERI" ]}
}
},
"LoadBalancerInstance" : {
"Type" : "TOUGH::EUCA::LaunchInstance",
"Properties" : {
"AccountId" : { "Ref" : "AccountId" },
"AvailabilityZone": { "Ref" : "AvailabilityZone" },
"LaunchConfig" : { "Ref" : "LoadBalancerLaunchConfig" },
"Setup" : {
}
}
},
"RegisterLoadBalancerInstance" : {
"Type" : "TOUGH::ElasticLoadBalancing::RegisterLoadBalancerInstance",
"Properties" : {
"AccountId" : { "Ref" : "AccountId" },
"LoadBalancerName" : { "Ref" : "LoadBalancerName" },
"Instance" : { "Ref" : "LoadBalancerInstance" }
}
},
"Setup" :{
"Type" : "TOUGH::EUCA::Parallel",
"Operations" : {
"TrackLoadBalancerInstance" : {
"Type" : "TOUGH::EUCA::TrackInstance",
"Name" : "LoadBalancerInstance",
"Properties" : {
"AccountId" : { "Ref" : "AccountId" },
"InstanceId" : { "Fn::GetAtt" : [ "LoadBalancerInstance", "InstanceId" ] }
}
},
"InstalLoadBalancerSoftware" : {
"Type" : "TOUGH::ElasticLoadBalancing::InstallLoadBalancerSoftware",
"Properties" : {
"AccountId" : { "Ref" : "AccountId" },
"IP" : { "Fn::GetAtt" : [ "LoadBalancerInstance", "PublicIp" ] }
}
}
}
}
},

"Outputs" : {
"PublicIP" : {
"Description" : "PublicIP address of the LoadBalancer",
"Value" : { "Fn::GetAtt" : [ "LoadBalancerInstance", "PublicIp" ] }
}
}
}

The JSON format can be a bit difficult to read if you're not familiar with it. Amazon and others now have UI's that facilitate the creation of the templates. In this example, there are a few items worth noting:
1. The template accepts input variables and returns information at the end of execution
2. The orchestration automates a series of tasks (launches a bare image, installs LB software, tracks the progress, configures the software, registers the newly launched instance, etc.)
3. The templates treat the cloud concepts (availability zones, cloud services, etc.) as first-order concepts in the syntax.

Keep in mind that the orchestration scripts can be multiple levels deep. This example was a simple one just to launch a load balancer. A more complicated orchestration would initiate multiple orchestration templates.

In the coming months, we'll be releasing a series of templates designed to orchestrate the provisioning of many common applications. The provisioning templates will fully leverage the power of the cloud (auto scale, auto recover, auto-snapshot, auto balance, etc.)

Sunday, April 24, 2011

Private Cloud Provisioning & Configuration

Cloud provisioning has focused on the rapid acquisition and initialization of a new server, disk or some other piece of infrastructure. Provisioning a single piece of infrastructure is now quite easy. Provisioning an entire set is much more complicated. In addition to the setup of a single piece of equipment, it's necessary to understand the dependencies between elements. In some cases, certain infrastructure components must be launched before another element or configuration data from one item needs to be used in a third element. Getting it all right is a difficult task and is a major cause of system failures. An approach to solving the problem is to consider the Deployment Fidelity, that is, the degree to which a deployment is able to fully describe it's architecture and configuration in a digitally precise manner.

Historically, application architects have used Word documents and Visio diagrams to depict the relationship between their software modules and the hardware infrastructure that would host them. Deployment Fidelity deals with accurately describing a set of computing resources and their relationship to each other. Organizations that embrace high fidelity will digitally describe their software and hardware topology: what type of hardware, operating systems, memory, infrastructure services, platform services, etc. and pass the digital description to the cloud provisioner for execution. The business value is two-fold. First, the high fidelity description reduces the chances of manual error, especially during hand-off. Second, the automation of the provisioning task reduces the deployment time and associated costs (e.g., sysadmins running individual scripts, testers waiting for new environments, etc.)





To increase the Deployment Fidelity, the relationships between elements must be captured. For instance, if an application server uses a relational database, the link between the two is recorded and configuration variables (such as IP addresses) are noted. If the server has an outage, a replacement can be auto-launched with the same configuration information. As the complexity of an application increases (load balancers, web servers, app servers, multiple databases, message queues, pub/sub, etc.) the need to keep a digital description becomes extremely important in order to reduce the chance of errors during deployment.

From an organizational perspective, there are two highlights: 1. The deployment architect can describe their proposed solution with complete fidelity - no misinterpretation. In addition, if there is an issue, the changes to the architecture can be captured in version control, just as if it was another piece of software code. 2. The sysadmin or release engineer can take the provisioning script and easily create a new environment (i.e., replicating Dev to Test, etc.)

Today, MomentumSI is announcing the release of two new services that orchestrate the provisioning of complex application topologies and then provide the configuration information:
The Tough Provisioning Service provides equivalent functionality found in Amazon's CloudFormation and is API/Syntax compatible with their offering.

The Tough Configuration Service integrates the most popular configuration management systems into the private cloud. Use your choice of Chef or Puppet to create configuration scripts and then expose them as enterprise grade services (secure access, multiple node delivery, guaranteed transmission, closed loop feedback, etc.)

Our solution brings this functionality to your private cloud by complementing your existing investment in VMware or Eucalyptus.

For more information, see Tough Solutions.

Tuesday, April 05, 2011

Are Enterprise Architects Intimidated by the Cloud?

Are Enterprise Architects Intimidated by the Cloud?

EA's are often the champion of large change initiatives that span multiple business units. If they're not on board - we've got problems.

Here's why I ask the question:
1. It's my perception (perhaps incorrect) that the EA leadership typically doesn't come from a background in infrastructure architecture. It's been my observation that the EA's who tend to get promoted usually have a background in business or application architecture. These people are often hesitant to enter deep discussions on CPU power consumption, DNS propagation, VLAN decisions, storage protocols, hypervisor trade-offs, etc.

2. Most people have agreed that the cloud can be viewed as a series of layers. You can attack it from top (SaaS) or bottom (IaaS). Quite frankly, there isn't *that much* architecture in SaaS (other than the secure connection and integration). That leaves IaaS as the starting point - which takes me back to point #1 - IaaS intimidates the EA team - - meaning that they're relying on the I.T. data center operations team (and localized infrastructure architects) to define the foundational IaaS layers which will serve PaaS, Dev/Test, disaster recovery, hadoop clusters, etc.

Any truth here? Leave a comment (moderated) or send me an email either way: jschneider AT MomentumSI DOT com

Monday, April 04, 2011

Cloud.com offers Amazon API

The most recent version of Cloud.com is now offering a 'bridge' for the core AWS EC2 services:

"CloudBridge provides a compatibility layer for CloudStack cloud computing software that tools designed for Amazon Web Services with CloudStack.

The CloudBridge is a server process that runs as an adjunct to the CloudStack. The CloudBridge provides an Amazon EC2 compatible API via both SOAP and REST web services."
The functions they support include:

Addresses
AllocateAddress
AssociateAddress
DescribeAddresses
DisassociateAddress
ReleaseAddress
Availability Zones
DescribeAvailabilityZones

Images
CreateImage
DeregisterImage
DescribeImages
RegisterImage
Image Attributes
DescribeImageAttribute
ModifyImageAttribute
ResetImageAttribute

Instances
DescribeInstances
RunInstances
RebootInstances
StartInstances
StopInstances
TerminateInstances
Instance Attributes
DescribeInstanceAttribute

Keypairs
CreateKeyPair
DeleteKeyPair
DescribeKeyPairs
ImportKeyPair

Passwords
GetPasswordData
Security Groups
AuthorizeSecurityGroupIngress
CreateSecurityGroup
DeleteSecurityGroup
DescribeSecurityGroups
RevokeSecurityGroupIngress

Snapshots
CreateSnapshot
DeleteSnapshot
DescribeSnapshots

Volumes
AttachVolume
CreateVolume
DeleteVolume
DescribeVolumes
DetachVolume

Although this list represents the core features of EC2, it doesn't yet cover the upper layers (CloudWatch, Auto Scale, etc.) or the PaaS offering (SNS, SQS, etc.) Regardless, I'm excited to see more emphasis being placed on supporting the AWS standard. It's easy for people to say that IaaS standards don't matter. However, if you're the guy building software on top of IaaS, they matter a WHOLE lot.

Cloud.com is a solid piece of software that has achieved success in the service provider market. To date, they haven't pushed too hard in the enterprise. Their decision to embrace the AWS API is a good one - and is complemented with their decision to use the pieces of OpenStack in their software where appropriate. This idea seems to be getting more traction. I'm hearing more and more people talking about OpenStack like it's a drawer that you reach into and grab out the components that you want - - rather than a holistic platform. I'm not sure if that's what the OpenStack team was shooting for but it's interesting to see guys like Cloud.com being open to leveraging the bits and pieces that they find useful.


Saturday, April 02, 2011

The commoditization of scalability

Last week, I had an interesting discussion with a product owner at an ISV. We discussed his offering; it was core plumbing-middleware-kind-of-stuff. When I asked about how he differentiated his offering from others on the market the answer was that they scale better. Our discussion moved from what he was doing to what I was up to and without trying to be coy I said, "We enable the commoditization of scalability". What I mean by this is that we help our customers adopt public and private clouds that know how to auto scale applications (and much more).

Of course, ISV's have always used non-functional attributes like availability, scalability and security as competitive differentiators in their offering. These capabilities are now being provided as features in the IaaS fabric. The next generation products coming from ISV's will need to redesign their solution on top of cloud infrastructures like Amazon, Eucalyptus, vCloud Director, Cloud.com, OpenStack and Nimbula. It will no longer be acceptable for an ISV to march into a customer and demand a block of servers to run their proprietary clusters. They will be expected to be able to allocate computer resources from the IaaS common pool. In addition, the ISV's will need to differentiate on attributes other than those provided by the IaaS fabric.

This change will affect the corporate I.T software development department as well. I've witnessed several I.T. groups design highly scalable architectures. Usually, the I.T. personnel aren't educated to perform this kind of work and either the project fails or delivery costs are very high. I believe that the I.T. departments that invest in IaaS will be able to significantly reduce the cost to design, deploy and operate highly scalable systems. It might be premature to declare the commoditization of scalability, but I truly believe we are witnessing the most significant step towards that goal in my 20 year career.

Wednesday, March 16, 2011

Providing Cloud Service Tiers

In the early days of cloud computing emphasis was placed on 'one size fits all'. However, as our delivery capabilities have increased, we're now able to deliver more product variations where some products provide the same function (e.g., storage) but deliver better performance, availability, recovery, etc. and are priced higher. I.T. must assume that some applications are business critical while others are not. Forcing users to pay for the same class of service across the spectrum is not a viable option. We've spent a good deal of time analyzing various cloud configurations, and can now deliver tiered classes of services in our private clouds.

Reviewing trials, tribulations and successes in implementing cloud solutions, one can separate tiers of cloud services into two categories: 1) higher throughput elastic networking; or 2) higher throughput storage. We leave the third (more CPU) out of this discussion because it generally boils down to 'more machines,' whereas storage and networking span all machines.

Higher network throughput raises complex issues regarding how one structures networks – VLAN or L2 isolation, shared segments and others. Those complexities, and related costs, increase dramatically when adding multiple speed NICS and switches, for instance 10GBase-T, NIC teaming and other such facilities. We will delve into all of that in a future post.

Tiered Storage on Private Cloud

Where tiered storage classes are at issue, cost and complexity is not such a dramatic barrier, unless we include a mix of network and storage (i.e., iSCSI storage tiers). For the sake of simplicity in discussion, let's ignore that and break the areas of tiered interest into: 1) elastic block storage (“EBS”); 2) cached virtual machine images; 3) running virtual machine (“VM”) images. In the MomentumSI private cloud, we've implemented multiple tiers of storage services by adding solid state drives (SSD) drives to each of these areas, but doing so requires matching the nature of the storage usage with the location of the physical drives.

Consider implementing EBS via higher speed SSD drives. Because EBS volumes avail themselves over network channels to remain attachable to various VMs, unless very high speed networks carry the drive signaling and data, a lower speed network would likely not allow dramatic speed improvements normally associated with SSD. Whether one uses ATA over Ethernet (AoE), iSCSI, NFS, or other models to project storage across the VM fabric, even standard SATA II drives, under load could overload a one-gigabit Ethernet segment. However, by exposing EBS volumes on their own 10Gbe network segments, EBS traffic stands a much better chance of not overloading the network. For instance, at MSI we create a second tier of EBS service by mounting SSD on the mount points under which volumes will exist – e.g., /var/lib/eucalyptus/volumes, by default, on a Eucalyptus storage controller. Doing so gives users of EBS volumes the option of paying more for 'faster drives.'

While EBS gives users of cloud storage a higher tier of user storage, the cloud operations also represent a point of optimization, thus tiered service. The goal is to optimize the creation of images, and to spin them up faster. Two particular operations extract significant disk activity in cloud implementation. First, caching VM images on hypervisor mount points. Consider Eucalyptus, which stores copies of kernels, ramdisks (initrd), and Eucalyptus Machine Images (“EMI”) files on a (usually) local drive at the Node Controllers (“NC”). One could also store EMIs on an iSCSI, AoE or NFS, but the same discussion as that regarding EBS applies (apply fast networking with fast drives). The key to the EMI cache is not so much about fast storage (writes), rather rapid reads. For each running instance of an EMI (i.e., a VM), the NC creates a copy of the cached EMI, and uses that copy for spinning up the VM. Therefore, what we desire is very fast reads from the EMI cache, with very fast writes to the running EMI store. Clearly that does not happen if the same drive spindle and head carry both operations.

In our labs, we use two drives to support the higher speed cloud tier operations: one for the cache and one for the running VM store. However, to get a Eucalyptus NC, for instance, to use those drives in the most optimal fashion, we must direct the reads and writes to different disks,– one drive (disk1) dedicated to cache, and one drive (disk2) dedicated to writing/running VM images. Continuing with Eucalyptus as the example setup (though other cloud controllers show similar traits), the NC will, by default, store the EMI cache and VM images on the same drive -- precisely what we don't want for higher tiers of services.

By default, Eucalyptus NCs store running VMs on the mount point /usr/local/eucalyptus/???, where ??? represents a cloud user name. The NC also stores cached EMI files on /usr/local/eucalyptus/eucalyptus/cache -– clearly within the same directory tree. Therefore, unless one mounts another drive (partition, AoE or iSCSI drive, etc.) on /usr/local/eucalyptus/eucalyptus/cache, the NC will create all running images by copying from the EMI cache to the run-space area (/usr/local/eucalyptus/???) on the same drive. That causes significant delays in creating and spinning up VMs. The simple solution: mount one SSD drive on /usr/local/eucalyptus, and then mount a second SSD drive on /usr/local/eucalyptus/eucalyptus/cache. A cluster of Eucalyptus NCs could share the entire SSD 'cache' drive by exposing it as an NFS mount that all NCs mount at /usr/local/eucalyptus/eucalyptus/cache. Consider that the cloud may write an EMI to the cache, due to a request to start a new VM on one node controller, yet another NC might attempt to read that EMI before the cached write completes, due to a second request to spin up that EMI (not an uncommon scenario). There exist a number of ways to solve that problem.

The gist here: by placing SSD drives at strategic points in a cloud, we can create two forms of higher tiered storage services: 1) higher speed EBS volumes; and 2) faster spin-up time. Both create valid billing points, and both can exist together, or separately in different hypervisor clusters. This capability is now available via our Eucalyptus Consulting Services and will soon be available for vCloud Director.

Next up – VLAN, L2, and others for tiered network services.

Monday, March 14, 2011

Auto Scaling as a Service

The Tough Auto Scaling Service is our offering to enable the automated scaling of an application tier at runtime. System data collected by a monitoring service provides the intelligence to provision or deprovision resources according to SLA's. Out of the box, our service uses our own Tough Monitoring Service, however, since we're using the de facto standard (Amazon Web Services), you can plug in any implementation that is AWS compatible.

The auto scaling service works by defining an 'auto scaling group'. This identifies the kind of service which will shrink or expand based on system load. The most common use case for auto scaling is for the Web tier where additional Web servers are added on the fly to respond to heavy loads. Auto scaling can also be used on stateful tiers but extra attention must be spent on managing the state replication mechanisms (clustering, etc.)

As new servers are provisioned to respond to the load request, they can be added to a dynamically programmable load balancer. This enables in-bound application traffic to be evenly divided across the array of virtual servers identified in an auto scaling group. Conversely, when the load returns to normal levels, the virtual servers are taken out of the load balanced pool allowing a graceful shutdown. To enable this scenario, we're using our Tough Load Balancing Service, but once again, customers can use any AWS compatible load balancer to perform this operation.

One of the key concepts of cloud computing is the concept of 'elasticity'; another is 'automation'. The Auto Scaling Service brings these two concepts together and applies them toward the compute side of the world to provide three key benefits:
1. Increased success rates on Service Level Agreements - The system auto scales to meet SLA's
2. Higher utilization rates - Unused virtual servers are released back to the pool
3. Reduced operating costs - Predefined policies automate activities that previously would have been human intensive tasks.

Combined, these three benefits make auto scaling a critical component of any private / hybrid cloud environment. It's also worth pointing out that the auto scaling service is a fundamental building block to enable other scalable services such as Platform Services (PaaS).

Wednesday, March 09, 2011

Non-Invasive Cloud Monitoring as a Service

The Tough Cloud Monitoring solution is our next generation offering targeting virtualized workloads, as well as PaaS services, housed in either traditional data centers or private cloud environments.

By monitoring, we mean 'health and performance' monitoring of infrastructure and platforms. Our service provides the traditional statistical information: CPU utilization, disk I/O, network traffic, etc. This begs the question, "why does the world need yet-another monitoring solution?" Quite frankly, we were surprised that there weren't better options available on the market. So, once again, we started from scratch with a new design center:

1. Make it massively scalable and highly available
Some of our customers currently have 1,000's of virtualized work loads operating and it is clear that the next generation service providers will have 10's of thousands running. Our design needed to easily scale itself from both a data collection perspective and burst storage. We bit the bullet and designed a solution from scratch to use Apache Cassandra at the core. This enabled us to leverage it's built-in cross-data center peer replication schemes and dynamic partitioning. In addition, Cassandra was a good fit for us because it was designed to accept very fast (stream oriented) writes of data.

2. The monitors should be non-invasive and agent free
Being non-invasive is always a good goal; it makes it easier to collect data on targets without having to install additional software on the machine (which can be a real problem when you already have lots of machines running in production). Knock-on-wood, but so far, we've been able to deliver all of our monitors completely out-of-band. No need to install Ganglia, collectD, etc. on hundreds/thousands of boxes...

3. The monitors should support a standard, service oriented API
In building our early private clouds, we were surprised to see that most of the system monitoring tools were "closed" systems. They collected the data but didn't make it easily available to other systems; they were designed to deliver the information to humans in HTML. This was a non-starter for us since the new world is about achieving higher levels of system automation (not human tasking). Naturally, we went with the de facto standard Amazon Web Services and the CloudWatch API. Our solution delivers full compatibility with CloudWatch from a WSDL, AWS Query and command line perspective. This makes it real easy for the monitoring data to be consumed by other services like Auto Scale.

4. Use a consistent model for IaaS and PaaS
By supporting the AWS service interface model, we inherited this feature. Just as CloudWatch monitors services like their Elastic Load Balancer and Relational Data Services, we'll be providing similar support for internal PaaS platforms.

We believe that we have achieved all of our design goals. The Tough Cloud Monitor is available today for traditional data centers, private clouds or service providers.

Tuesday, March 08, 2011

Tough Load Balancing as a Service

Last week, MomentumSI announced the availability of our Tough Load Balancing Service along with a Cloud Monitoring and Auto Scaling solution.

The concept of load balancing has been around for decades - so nothing too new there. However, applying the 'as a Service' model to load balancing remains a fairly new concept, especially in the traditional data center. Public cloud providers like Amazon have offered a similar function for the last couple of years and have seen significant interest in their offering. We believe LB-aaS offers an equivalent productivity boost to traditional data centers, private cloud customers or service providers who want to extend their current offerings.

Our design goals for the solution were fairly simple - and we believe we met each of them:

1. Don't interfere with the capability of the underlying load balancer
The LB-aaS solution wraps traditional load balancers (currently, software based only) to enable rapid provisioning, life-cycle management, configuration and high availability pairing. All of these functions run outside of the ingress / egress path of the data. This means you do not incur additional latency in the actual balancing. Also, our design enables us to snap in various load balancer implementations. Our current solution binds to HAProxy and Pound for SSL termination. Based on customer demand, we anticipate adding additional providers (e.g, F5, Zeus, etc.) Our goal is to nail the "as-a-Service" aspect of the problem and to be able to easily swap in the right load balancer implementation for our customers specific needs.

2. Make life easier for the user
I was recently as one of my enterprise customers speaking with an I.T. program manager. She commented that her team was in a holding pattern while they ordered a new load balancer for their application. Her best guess was that it was going to take about 5 weeks to get through their internal procurement cycle and then another 2-3 weeks to get it queued up for the I.T. operations people to get around to installing, configuring and testing it out. When I told her about our LB-aaS solution (2-3 minutes to provision and another whopping 5-10 minutes to configure), she just started laughing... and made a comment about necessity being the mother of all invention.

3. Deliver an open API
Delivering an open API was an easy decision for us. We went with the Amazon Web Service Elastic Load Balancer API. We maintained compatibility with their WSDL as well as providing command line capabilities and the use of their AWS Query protocol. As the ecosystem around AWS continues to grow, we want companies to be able to immediately plug into our software without code-level changes.

4. Don't cause pain down the road
We've seen some companies put software based load balancers into their VM image templates. We see this as last-years stop-gap solution. The lack of device-specific life-cycle management leads to configuration drift and no service-oriented interface means you can't use the load balancer as part of an integrated solution pattern (like auto-scale). Let's face it, the world is moving to an 'as a service' model for some good reasons.

Again, the Tough Load Balancing Service is available today and can easily work in current data centers, private clouds or service providers.

Wednesday, March 02, 2011

Separating IaaS into Two Layers

For some time now, I've been watching cloud architects consider their strategy for deploying wide-scale Infrastructure-as-a-Service (IaaS). Many of my friends are quick to draw the standard Gartner cloud stack (SaaS, PaaS followed by IaaS). And although I think this is a simple way to look at the layers, it can be dangerous if that's where the conversation ends.

I'd like to suggest that we consider at least two distinct IaaS layers:



Some people call the first layer, "Hardware-as-a-Service". It primarily focuses on the 'virtualization' of hardware enabling better manipulation by the upper layers. This was the core proposition of the original EC2. There are some great vendors in this space like Eucalyptus, Cloud.com and VMware. Cool projects are also emerging out of OpenStack which many of the aforementioned companies hope to adopt and extend.

The second layer is the 'automation layer'. It focuses on providing convenience mechanisms around layer 1 services. This includes everything from making multi-step human tasks more easily accomplished through orchestrations, to closed-loop systems akin to the problem defined in autonomic computing. The core elements delivered in layer 2 includes self-inspection, self-healing, self-protection and resource optimization. These are some pretty powerful concepts. So powerful in fact, that it often makes sense for consuming technologies to bind to layer 2, rather than directly to layer 1.

We're starting to see this layered approach unveil itself at Amazon. Services like Elastic Beanstalk focus on integrating many of the lower layer building blocks into an easy to consume bundle, while also delivering several of the autonomic properties. It's pretty cool stuff. But, it's only cool if you actually use it. I loved that Amazon started off real low in the stack (EC2 servers) and worked their way up. It was fundamentally the right way to rethink the problem. The downside is that many engineers are now overly comfortable using the original atomic elements when they need to be looking harder at the new convenience layers (e.g., CloudFormation, Elastic Beanstalk, etc.)

The announcement we made yesterday regarding custom implementations of Amazon CloudWatch, Elastic Load Balancer and Auto Scaling for private cloud demonstrate our commitment to this approach. We're also big believers in industry standards. In my younger (and more naive) days, I would have preached about 'open standards' over 'industry standards' but I've sat in on too many industry conference calls listening to vendors with agendas bicker over standards only to wait years to get a solution which was designed by a committee. When it comes to cloud standards, I'll gladly let those younger (or more patient) than I fight those fights. Until then, we're backing the de-facto standard, AWS. And to those who say "standardized API isn't important", I'll have to kindly disagree ;-)

Friday, February 25, 2011

Amazon CloudFormation Exceeds Expectations

Today, Amazon released their latest offering, CloudFormation. Simply put, CloudFormation is the service we've all been waiting for. The entire topology of an application can be described including the images, storage, security, load-balancing, auto-scaling, databases, messaging and more. It's the glue that holds it all together.

CloudFormation provides some UI screens to allow developers & release engineers to easily describe the makeup of their applications. Under the covers, the description is turned into a structured template. This template can then be sent to the AWS provisioning engine which understands the dependency chain - and launches each component in the precise order.

In standard AWS fashion, the template descriptions are available for developers to review or to create from scratch. Once a template is created, CloudFormation provides an API for developers to call which will take the template as input and execute it. It's great to see Amazon continue down the path of not only providing UI's but also making the functions available as services.

At MomentumSI, we've been anticipating the launch of this service. It pulls together all of the piece-parts which Amazon has been developing over the years. Finally, the picture can be painted on how Amazon can be used for complete application solutions. I tip my hat.