The SmallPayroll.ca application has been migrated to the Amazon cloud and can even add and remove servers on its own, depending on load. It's now likely that at any given time the numbers and IP addresses of the active servers can not be predicted, which makes connecting to them a challenge. The cloud environment is different than a traditional data center because of this.
The dynamic nature of the cloud also makes application deployment difficult. Your list of servers will be different between deployments, so how do you update the application? For that matter, how do you monitor your servers for faults?
This isn't your normal data center
In a normal data center you can name your computers whatever you want, give them IP addresses that suit you, and if you want, go and look at your servers to make sure they're still there. Maybe you keep a spreadsheet to track your servers, maybe you have software, and maybe you just keep information in your head or in a text file. Do you have configuration management to make sure your configuration is consistent?
The cloud environment is much different than a traditional data center because you're ceding control of many functions. You can't predict IP addresses, or even ensure that two servers will be on the same subnet. If you progress to automatic scaling of resources, all the hard work you did in manual configurations might be lost when a new node is launched. Your scripts that rely on knowing you have 20 web servers with a predictable name won't work in the cloud.
Fortunately a bit of discipline can work around these problems, and even improve your uptime in the physical data center!
People tend to spend a great deal of time worrying about what to name their servers and how to come up with a sensible IP addressing scheme. EC2 instances come up with a fairly random IP address and a name based on this address. You could certainly rename your server but that often requires knowledge of the rest of the environment. For example, to call a server "webprd42", you have to know the last server launched was "webprd41".
The better solution is not to rely on names or IP addresses, and to build your software such that these names don't matter.
In a physical environment you can usually get away with making manual configuration changes to servers. When servers are launched automatically, manual changes won't be applied. You can rebundle your AMI after each change but this does not solve the problem of how to push updates to the other servers that are already running. Fortunately there are excellent software packages, such as Puppet and Cfengine that can automate these changes for you.
Deploying application changes is another aspect of configuration management that deserves being looked at separately. Generic configuration management tools can do the job but using them to reproduce the specific steps in deploying an application and managing migrations and configuration rollbacks is difficult. The Rails community have come out with other tools, such as Capistrano, to handle the task of application deployment.
It is helpful to look at configuration management as two separate problems. The first is how to manage the server, from the installation of software packages to the configuration of various daemons. The second is how to deploy new versions of software in a controlled manner.
It is very important to know what your servers are doing. CPU, disk resources, memory, and network are vital components to monitor. Daemons running on your system, including the application itself, may have other metrics to watch. For example, watching application response time and the number of connections to the web server and application server can warn you of problems before they happen.
There are many tools available to monitor servers and to graph the results. The challenge is how to monitor new servers as they come online, and how to stop monitoring them as the server is taken offline.
Patterns as applied to cloud architecture
There are three general patterns that emerge when you look at how to manage a dynamic environment such as Amazon EC2.
- Client poll - The server queries a central server for resources. You don't need to know the addresses of all your servers using this pattern, but the servers operate on their own schedule so you can't control the timing of the client's polling.
- Server push - This pattern first queries the cloud provider's API to find the current list of servers, then a central server contacts each server to do the work. This is slower and requires that the management tool understand the dynamic nature of the environment, but has the benefit of allowing you to synchronize updates.
- Client registration - As each server comes online, it registers itself with a central server. Before the server is terminated it deregisters itself. This method is more complex, but lets you use non cloud aware tools in a cloud environment.
Client polling for configuration management
It is very easy to implement this pattern. A client simply polls a well known server for instructions on a pre-determined schedule. If the server doesn't have anything for the client to do, the server informs the client of such. The downside is that instructions can only be issued if the client polls the server, and if the change is urgent, it must wait for the next poll.
An excellent use for polling is configuration management of the server. The Puppet package from Reductive Labs is a popular configuration management tool. A process, called the Puppetmaster, runs on a central server. Clients run the Puppet daemon, which polls the Puppetmaster for the appropriate configuration manifest. These configuration manifests specify the desired end state of a particular component, such as "make sure that the NTP daemon is installed and running". Puppet will read these manifests and correct any problems.
Your distribution may come with Puppet, or you can quickly install it with gem install puppet facter
.
Puppet implements a security system that complicates matters. Clients must have a signed key to talk to the Puppetmaster. You can tell the Puppetmaster to automatically sign keys for clients that connect, but this would allow anyone to download your configuration files. An alternative solution is to ignore the Puppetmaster, distribute your manifests yourself, and run the Puppet tools locally.
The sequence of events to have the client run the Puppet manifests is as follows:
- Download an updated copy of the manifests and any associated files from the server.
- Run Puppet against the manifest.
For step #1, the tool of choice is rsync, which will only download changed files. For step #2, the puppet command (part of the puppet installation) will execute the manifest.
There are two caveats to the above approach:
- The server must accept the client's SSH public key. This key can be distributed in the AMI.
- Any configuration files that are specified in the manifest must be copied with the manifest. The built in Puppet file server also requires certificates, so this file transfer method can't be used.
The sample manifest will ensure the client has the correct network time protocol configuration. This involves making sure the software is installed, the configuration file is modified, and the daemon is running. Listing 1 shows the top level manifest.
Listing 1. The top level manifest
import "classes/*" node default { include ntpclient } |
Listing 1 first imports all the files in the classes directory, each file contains information about a single component. All nodes then include the ntpclient class, which is defined in Listing 2.
Listing 2. The ntpclient class
class ntpclient { package { ntp: ensure => installed } service { ntpd: ensure => true, enable => true, subscribe => [ Package [ntp], File["ntp.conf"] ], } file { "ntp.conf": mode => 644, owner => root, group => root, path => "/etc/ntp.conf", source => "/var/puppetstage/files/etc/ntp.conf", before => Service["ntpd"] } } |
A detailed look at the Puppet language is outside the scope of this article, but at a high level Listing 2 defines a class called ntpclient that is composed of a package called ntp, a service called ntpd, and a file in /etc called ntp.conf. If the ntp package is not installed, Puppet will use the appropriate tool, such as yum or apt-get to install it. If the service is not running and in the startup scripts, it will be fixed. If the ntp.conf differs from the copy in /var/puppetstage/files/etc, the file will be updated. The before and subscribe lines make sure that the daemon gets restarted if the configuration changes.
The server will store the manifests and files in /var/puppetdist, and clients will copy that tree to /var/puppetstage. The outline of the directory tree is shown in Listing 3.
Listing 3. Contents of /var/puppetdist
/var/puppetdist/ |-- files | `-- etc | `-- ntp.conf `-- manifests |-- classes | `-- ntp.conf `-- site.pp |
Finally, Listing 4 synchronizes the files and runs the manifest on the client.
Listing 4. Client code to synchronize and run the manifest
#!/bin/bash /usr/bin/rsync -avz puppetserver:/var/puppetdist/ /var/puppetstage/ --delete /usr/bin/puppet /var/puppetstage/manifests/site.pp |
Listing 4, when run from cron periodically, will pick up any changes in the manifests and apply them to the cloud server. If the server's configuration somehow gets changed, Puppet will take steps to put the server back into compliance.
Configuration updates on servers rarely require synchronization between servers. If a package needs upgrading, a half hour window is usually enough. For application updates, however, you want to roll out your changes at once, and you want control over the timing. A popular tool for accomplishing this is Capistrano. You write a script that uses Capistrano's domain specific language (DSL) and run various tasks. Listing 5 shows a minimal Capistrano script to push an application to a known set of servers:
Listing 5. A simple Capistrano script
set :application, "payroll" set :repository, "https://svn.smallpayroll.ca/svn/app/trunk/" set :user, 'payroll' set :home, '/home/payroll' set :deploy_to, "#{home}" set :rails_env, "production" role :db, "174.129.174.213", :primary => true role :web, "174.129.174.213", "184.73.3.169" |
Most of the lines in Listing 5 set variables that alter the default behavior of Capistrano, which is to SSH into all the servers and use a source code management tool to check out a copy of the application. The last two lines define the servers in use, in particular the database and web servers. These roles are known to Capistrano (and can be extended for your own purposes).
The problem with Listing 5 is that the servers must be predefined. It is possible to have Capistrano determine the list of servers at runtime using the Amazon Web Services APIs. First, run gem install amazon-ec2
to install a library that implements the API.
Next, modify your Capistrano recipe (deploy.rb) as shown in Listing 6.
Listing 6. Modifying Capistrano to dynamically load the list of servers at runtime
# Put this at the beginning of your deploy.rb require 'AWS' # Change your role :web definition to this role(:web) { my_instances } # This goes at the bottom of the recipe def my_instances @ec2 = AWS::EC2::Base.new( :access_key_id => ENV['AWS_ACCESS_KEY_ID'], :secret_access_key => ENV['AWS_SECRET_ACCESS_KEY']) servers = @ec2.describe_instances.reservationSet.item.collect do |itemgroup| itemgroup.instancesSet.item.collect {|item| item.ipAddress} end servers.flatten end |
Listing 6 changes the web role from a static definition to a dynamic list of servers returned from the my_instances function. The function uses the EC2 API DescribeInstances call to return a list of servers. The API returns data in a format that groups instances that were launched together under the same reservation identifier. The outer collect loop iterates over these reservation groups, and the inner collect loop iterates over the servers contained within each restrain group. The result is an array of arrays, which is flattened to a single dimensional array of server IP addresses and passed back to the caller.
It is fortunate that Capistrano has provided a way to operate on a dynamic list of servers. If it did not provide such hooks then you would need to take another approach.
Registering with a management server
For applications that don't easily allow you to use a dynamic list of servers, you can work around the problem by having the cloud server register itself with other applications. This generally takes one of two forms:
- The cloud server connects to another server and runs a script, which updates the management application directly.
- The cloud server drops a file with some metadata in a common place, such as S3, where other scripts look to rebuild their configuration files.
Cacti is a popular performance management tool. Cacti can graph various metrics through SNMP or scripts, and combine these graphs into dashboards or meta graphs. The limitation with Cacti is that you have to configure the server to be managed in the Cacti web interface, or through command line scripts. In this example, the cloud server will connect back to the Cacti server and configure itself.
Cacti is based on a system of templates, which makes mass changes to graphs much easier. All the command line tools operate on the template identifier, though, so you must first figure out which identifiers to use. Listing 7 shows how to find the host template, which pre-populates some data elements for you.
Listing 7. Listing the host templates
$ php -q /var/lib/cacti/cli/add_device.php --list-host-templates Valid Host Templates: (id, name) 0 None 1 Generic SNMP-enabled Host 3 ucd/net SNMP Host 4 Karlnet Wireless Bridge 5 Cisco Router 6 Netware 4/5 Server 7 Windows 2000/XP Host 8 Local Linux Machine |
Template number 3 is for a host running the Net-SNMP daemon, which is available with most Linux distributions out there. Using this specific daemon, rather than a more generic version, allows for some Linux specific counters to be easily monitored.
Knowing you are using host template #3, the list of available graphs is shown in Listing 8.
Listing 8. Listing the graph templates
$ php -q /var/lib/cacti/cli/add_graphs.php --list-graph-templates --host-template-id=3 Known Graph Templates:(id, name) 4 ucd/net - CPU Usage 11 ucd/net - Load Average 13 ucd/net - Memory Usage |
The three graphs in Listing 8 are what you get with the default Cacti distribution. There are many more that can be added, you can leave off the --host-template-id option to see them, or import the graphs from sources on the Internet.
Listing 9 shows how to add a new device, and then a CPU graph.
Listing 9. Adding a new device with a graph
$ php -q /var/lib/cacti/cli/add_device.php --description="EC2-1.2.3.4" \ --ip=1.2.3.4 --template=3 Adding EC2-1.2.3.4 (1.2.3.4) as "ucd/net SNMP Host" using SNMP v1 with community "public" Success - new device-id: (5) php -q /var/lib/cacti/cli/add_graphs.php --host-id=5 --graph-type=cg \ --graph-template-id=4 Graph Added - graph-id: (6) - data-source-ids: (11, 12, 13) |
Listing 9 first adds a host with the IP address of 1.2.3.4. The device id returned is 5, which is then used to add a graph for CPU usage (graph type of cg, and template 4). The results are the id of the graph, and the ids of the various data sources that are now being monitored.
It is now fairly easy to script the procedure in Listing 9. Listing 10 shows such a script.
Listing 10. add_to_cacti.sh
#!/bin/bash IP=$1 # Add a new device and parse the output to only return the id DEVICEID=`php -q /var/lib/cacti/cli/add_device.php --description="EC2-$IP" \ --ip=$IP --template=3 | grep device-id | sed 's/[^0-9]//g'` # CPU graph php -q /var/lib/cacti/cli/add_graphs.php --host-id=$DEVICEID --graph-type=cg \ --graph-template-id=4 |
The first parameter to the script is saved to a variable called $IP. The add_device.php script is run with this IP address, with the output filtered to only the line containing the ID using the grep command. The output of this is fed into a sed script that only prints numbers. This value is saved in a variable called $DEVICEID.
With the device id stored, adding a graph is as simple as calling the add_graphs.php script. It should be noted that the CPU graph is the simplest case and that some other types of graphs require more parameters.
With the add_to_cacti.sh script on the Cacti server, all it takes is for the cloud server to run it. Listing 11 shows how to call the script.
Listing 11. Calling the cacti script from the cloud server
#!/bin/bash MYIP=`/usr/bin/curl -s http://169.254.169.254/2007-01-19/meta-data/public-ipv4` ssh cacti@cacti.example.com "/usr/local/bin/add_to_cacti.sh $MYIP" |
Listing 11 first calls the EC2 meta data server to return the public IP address, and then runs the command remotely on the cacti server.
This series has followed the migration of an application from a single server to the Amazon Web Services cloud. Improvements were made incrementally to take advantage of the EC2 offerings, from launching new servers, to load balancers. This final article looked at managing a dynamic cloud environment, and offered some patterns that could be used.
Given the low cost of entry to using cloud resources, you should have a look and try to conduct a practice migration. Even if you decide not to run the application in production using the cloud, you will learn a lot about what can be done in the cloud, and perhaps improve your systems management skills.
Learn
-
LPI exam 301 prep, Topic 306: Capacity planning explains in detail how to monitor systems and measure results.
-
The S3 Cookbook is a PDF from Leanpub that explains how to use S3 with Ruby. The book goes through about 60 problems and explains how to solve the problem with code.
Get products and technologies
-
Now that you've got multiple AMIs inside S3, you might want to prune some old ones. S3 File Manager is a web based file manager that rivals the features of many standalone applications or browser plugins. If you delete an AMI, don't forget to
ec2-deregister
it. -
Capistrano is a popular deployment package that acts in a similar manner to Rake.
-
Cfengine is the most popular configuration management tool for Unix. It is lightweight and can operate on a large number of machines.
-
Cacti is a network graphing tool built around RRDTool. You can graph almost anything imaginable. If it's in your datacenter, there's a good chance that someone has already written a plugin to graph it.
-
Puppet is a configuration management tool written in Ruby built to overcome some limitations in Cfengine. If you're looking for a good way to start, Pulling Strings with Puppet is a book that I enjoyed.
Discuss
- Participate in the discussion forum.
- Get involved in the My developerWorks community.
Connect with other developerWorks users while exploring the
developer-driven blogs, forums, groups, and wikis.