Migrate your Linux application to the Amazon cloud, Part 1: Initial Migration

Migrate your application into the cloud

Sean Walberg, P. Eng (sean@ertw.com), Network Engineer, Independent author

Sean Walberg is a network engineer and the author of two books on networking. He has worked in several different verticals from healthcare to media.

Summary: Your article abstract goes here. Put the main points and key phrases at the beginning of the abstract, because it may be truncated in search results. Include symbols for any trademarked terms, such as Java™ or WebSphere®, as well as any HTML tagging needed for highlighting or linking to related developerWorks content. If this article is part of a series, the stylesheet creates a link to the entire series.

View more content in this series

Tag this!

Date: 26 Mar 2010
Level: Intermediate
PDF: A4 and Letter (56KB | 8 pages)Get Adobe® Reader®
Comments:

Introduction

Infrastructure as a Service (IaaS) is a great concept. You use computing resources, you pay for them. You want more computing power, you pay more. The downside of this model is that you're working with computers that you're never going to see, nor do you really know much about them. Once you get over that, there's a lot to be gained by using IaaS.

Because the IaaS model is so different than the traditional model of buying servers, the way you manage your virtual computers changes. It also means that the way you run your application in the cloud changes. Things you once took for granted, such as negligible latency between servers, are no longer a given.

This series will follow the migration of a web application from a single physical server to the Amazon Elastic Compute Cloud (EC2). Along the way you'll learn how to adapt your application to the cloud environment, and how to take advantage of the features that the cloud has to offer. To start, you'll see a straight migration from one physical server to a cloud server.

Amazon EC2

Amazon's EC2 product lets anyone with a credit card to pay for servers by the hour, turning them on and off through an application programming interface (API). You have a variety of server types to choose from, depending if memory, disk, or CPU power is your primary concern, along with a suite of addons from persistent disks to load balancers. You only pay for what you use, which makes the Amazon services a compelling option for your application.

Along side the EC2 offering are some others that give you, among other things, payment processing, databases, and message queuing. You will be using the Simple Storage Service (S3), which gives you access to disk space on a pay per use basis.

The application

The web application that this series will use for examples is a payroll service called SmallPayroll.ca, written with the Ruby on Rails framework and a PostgreSQL back end. It is typical of many web applications; it has a database tier, an application tier, and a set of static files like cascading style sheets (CSS) and JavaScript. Users navigate various forms to input and manipulate data, and they generate reports.

The various components in use are:

nginx: the front end web server for static files, and balancer to the middle tier
mongrel: the application server itself
ruby: the language with which the application is written
gems: several third party plugins and libraries were used, for everything from database encryption to application level monitoring
PostgreSQL: the SQL database engine

Usage of the site has exceeded the capacity of the single server that now houses it. Therefore, a migration to a new environment is in order, and this is a prime opportunity to move to the cloud.

Desired improvements

Moving from one server to a small number of cloud based servers would not be able to take advantage of what can be done in the cloud, nor would it make for exciting reading. Some improvements will be made, some of which are only possible in a cloud environment:

Increased reliability: Because you can choose the size of server to run in the cloud, it is possible to run multiple, smaller, servers for redundancy.
Capacity for both scale-up and scale-down: Servers are incrementally added to the pool as the service grows. However the number of servers can also be increased to accommodate short term spikes in traffic, or decreased during periodic lulls.
Cloud storage: Backups of the application data will be made to S3, eliminating the need for tape storage.
Automation: Everything in the Amazon environment, from the servers, to the storage, to the load balancers, can be automated. Less time managing an application means more time for other, more productive things.

These improvements will be made incrementally throughtout this series.

Testing and migration strategies

When you deploy an application for the first time, you generally have the luxury of being able to test and tweak without the burden of production traffic. By contrast, a migration has the added element of users who are placing a load on the site. Once the new environment takes production traffic, the users will be expecting that everything is working properly.

A migration does not necessarily mean zero downtime. A migration is much easier if you are allowed to take the service offline for a period of time. You will use this outage window to perform final data synchronizations and allow for any network changes to stabilize. The window should not be used to do the initial deploy to the new environment; that is the new environment should be in an operational state before the migration starts. With this in mind, the key points are synchronization of data between the environments and network changes.

As you prepare to figure out your migration strategy, it helps to begin with a walk through of your current environment. Answer the following questions:

What software do I use on my servers to run the application?
What software do I use on my servers to manage and monitor the application and server resources?
Where is all the user data kept? Databases? Files?
Are static assets, like images, CSS, and Javascript, stored somewhere else?
What touchpoints into other systems does the application need?
Have I backed everything up recently?

Notifying users

In general, notifying your users is a good thing, even if you don't anticipate any downtime. In the case of the SmallPayroll.ca application, users tend to use the site at a consistent interval, corresponding with their two week payroll cycle. Therefore, two weeks notice would be a reasonable period. Sites like Google AdWords, which is the administrative interface for the Google advertising platform, give about a week's notice. If your website is more of a news site where users would not be as disrupted if it were down for an hour, you may choose to give notification the day of the outage.

The form of notification also varies depending on the nature of your site and how you current communicate with your customers. For SmallPayroll.ca, a prominent message when the user logs in will be enough. For example a message like The system will be unavailable between 12:01am and 1am Eastern time, June 17, 2010. Everything entered prior to this will still be saved. For more information click here. This message provides the three key pieces of information that the user needs to know:

When the outage will happen, including timezone.
Reassurance that their data will be safe.
Contact details for further information.

Avoid, if possible, using 12:00 am or 12:00 pm, including the term "midnight". These tend to confuse people, as many are not sure if midnight on June 17 refers to early morning (12:01am) or very late (11:59pm). Similarly, many are not sure if noon means 12am or 12pm. It's much easier to add a minute and make the time abundantly clear.

Your details may be different, especially if you anticipate partial functionality during the outage. If you decide that you are only going to place the notice up during the outage (such as for a news site), the same information will still be helpful. The author's favorite site outage screen was along the lines of "The site is down for maintenance, back up around 3pm EST. Play this game of Asteroids while you're waiting!"

Don't neglect your internal users, either. If you have account representatives, you will want to give them notice in case their clients ask any questions.

DNS considerations

The Domain Name System (DNS) takes care of translating a name like www.example.com into an IP address, like 192.0.32.10. Your computer connects to IP addresses, so this translation is very important. When migrating from one environment to another, you are almost guaranteed to be using a different IP address (the exception would be if you're staying in the same physical building).

Computers cache the name to IP mapping for a certain period of time, known as the Time To Live (TTL), to reduce overall response time. When you make the switch from one environment to another, and therefore from one IP address to another, people who have the DNS entry cached will continue to try and use the old environment. The DNS entry for the application and its associated TTL must be managed carefully.

TTLs are normally between one hour and one day. In preparation for a migration, though, you would want the TTL at something short such as 5 minutes. This change must be done at least one TTL period before you intent to change the address, because computers get the TTL along with the name to IP mapping. For example, if the TTL for www.example.com was set to 86,400 seconds (1 day), you would need to reset the TTL to 5 minutes at least 1 day before the migration.

Decoupling the old and new environments

It is essential that you fully test your new environment before migrating. All testing should happen in isolation from the production environment, preferably with a snapshot of production data so that you can better exercise the new environment.

Doing a full test with a snapshot of production data serves two purposes. The first is that you are more likely to spot errors if you are using real world data because it is more unpredictable than the test data used during development. Real world data may refer to files that you forgot to copy over, or require certain configurations that were forgotten during your walk through.

The second reason to use production data is that you can practice your migration at the same time as you load data. You should be able to prove out most aspects of your migration plan, except for the actual switch of environments.

Even though you will be mocking up your new environment as if it were production, only one environment can be associated with the hostname of the application. The easiest way to get around this is to make a DNS override in your hosts file. In Unix, this is at /etc/hosts and in Windows it is in c:\windows\system32\drivers\etc\hosts. Simply follow the format of the existing lines and add an entry pointing your application's hostname to its future IP address. Don't forget to do the same for any image servers or anything else that you will be moving. You will probably have to restart your browser, but after that you will be able to enter your production URL and be taken to your new environment instead.

An Amazon EC2 primer

The EC2 service allows you to pay for a virtual machine by the hour. Amazon offers several different types of machines, and classifies them by their CPU, memory, and disk profiles. Amazon measures memory and disk in terms of GB, and CPU in terms of EC2 Compute Units (ECU), where 1 ECU is roughly a 1.0-1.2GHz Opteron or Xeon processor (2007 era). For example, the standard small instance gives you 1.7GB of memory, 160GB of disk, and 1 ECU of CPU. At the time of this writing, the biggest machine is the High Memory Quadruple Extra Large that has 68.4GB of memory, 1.7TB of disk, and 26ECUs which are split across 8 virtual cores. The prices range from 8.5 cents/hour for the smallest, to $2.40/hour for the biggest.

An EC2 instance begins life as an Amazon Machine Image (AMI), which is a template that is used to build any number of virtual machines. Amazon publishes some AMIs and you can make your own and share them with others. Some of these user created AMIs are free, and some incur an hourly charge on top of the Amazon hourly charge. For example, IBM publishes several paid AMIs that let you pay for licensing on an hourly basis.

When you want to boot a virtual machine you choose the machine type and an Amazon Machine Image (AMI). The AMI is stored in S3, and is copied to the root partition of your virtual machine when you launch the instance. The root partition is always 10GB. The storage space associated with the machine type is called the instance storage or ephemeral storage, and is presented to your virtual machine as a separate disk drive.

The storage is called ephemeral because when you shut down your instance, the information is gone forever. You are required to back up your own data periodically to protect against loss. This also means that if the physical host running your instance crashes, your instance is shut down and the ephemeral disk is lost.

The Amazon Machine Image

All AMIs are assigned an identifier by Amazon, such as ami-0bbd5462. Amazon provides some public AMIs, and other people have made their own AMIs public. You can choose to start with a public one and make your own modifications, or you can start from scratch. Any time you make changes to the root filesystem of an AMI you can save it as a new AMI, which is called rebundling.

In this series, you will be starting off with a publicly available CentOS image, though there is no reason you can't choose a different one. It is wise to spend some time looking through any image you use to make sure there are no extra accounts and that the packages are updated. It is also possible to roll your own AMI from scratch, but that is outside the scope of this article.

The Amazon API

All of the functionality necessary to start, stop, and use the EC2 cloud is available using a web service. Amazon publishes the specifications for the web services, and also provides a set of command line tools. You should download these tools before proceeding. I also encourage you to look at the quick start guide to get your environment set up, which will save you a lot of typing.

You authenticate to the API using security credentials. These are found from the Account link within the AWS management console (see the resources.) You will need your X.509 certificate files and your access keys. Keep these safe! Anyone with them could use AWS resources and incur charges on your behalf.

Before you launch your first instance

Before you launch your first instance you must generate SSH keys to authenticate to your new instance, and set up the virtual firewall to protect your instance.

Listing 1 shows the use of the ec2-add-keypair command to generate an SSH keypair.

Listing 1. Generating an SSH keypair

[sean@sergeant:~]$ ec2-add-keypair main
KEYPAIR main    40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAu8cTsq84bHLVhDG3n/fe9FGz0fs0j/FwZiDDovwfpxA/lijaedg6lA7KBzvn
...
-----END RSA PRIVATE KEY-----
[sean@sergeant:~]$ ec2-describe-keypairs
KEYPAIR main    40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01

The first command tells Amazon to generate a keypair with the name main. The first line of the result gives the hash of the key. The rest of the output is an unencrypted PEM private key. You must store this somewhere, for example, ~/.ssh/main.pem. Amazon retains the public portion of the key, which will be made available to the virtual machines you launch.

The second command, ec2-describe-keypairs, asks Amazon for the current list of keypairs. The result is the name of the keypair, followed by the hash.

Each instance is protected by a virtual firewall that initially allows nothing in. EC2 calls these security groups and has API calls and commands to manipulate them. You will look at these more closely when the time comes. In the meantime, Listing 2 shows how to show your current groups.

Listing 2. Displaying the current security groups

[sean@sergeant:~]$ ec2-describe-group
GROUP   223110335193    default default group

Listing 2 shows a group called default, with a description of "default group". The userid associated with the group is 223110335193. There are no rules in this group. If there were, they would be described below the group with the word PERMISSION in the left column.

Preparing the Cloud Environment

The first step is to prepare the cloud environment to test the application. As a first step, the new environment will mimic the current production environment.

First, launch the AMI which has an ID of ami-10b55379. Listing 3 shows the AMI being launched, and the status being checked.

Listing 3. Launching the CentOS AMI

[sean@sergeant:~]$ ec2-run-instances ami-10b55379 -k main
RESERVATION  r-750fff1e  223110335193  default
INSTANCE  i-75aaf41e  ami-10b55379  pending  main  0  m1.small
2010-05-15T02:02:57+0000 us-east-1a  aki-3038da59  ari-3238da5b  monitoring-disabled
instance-store  
[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e
RESERVATION  r-750fff1e  223110335193  default
i-75aaf41e  ami-10b55379  pending  main  0  E3D48CEE  m1.small
2010-05-15T02:02:57+0000 us-east-1a  aki-3038da59  ari-3238da5b  monitoring-disabled
instance-store  
[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e
RESERVATION  r-750fff1e  223110335193  default
INSTANCE  i-75aaf41e  ami-10b55379  ec2-184-73-43-141.compute-1.amazonaws.com
domU-12-31-39-00-64-71.compute-1.internal  running  main  0  E3D48CEE  m1.small
2010-05-15T02:02:57+0000  us-east-1a  aki-3038da59  ari-3238da5b  monitoring-disabled
184.73.43.141  10.254.107.127  instance-store

The first command launches the instance using the ami-10b55379 AMI, and specifies that the keypair generated in Listing 1 is to be used to authenticate to the machine. The command returns several pieces of information, most important is the instance identifier (i-750fff1e) which is the identity of the machine in the EC2 cloud. The second command uses the ec2-describe-instances command, which lists all the running instances. In Listing 3, the instance identifier has been passed on the command line to only show information about that instance. The state of the instance is listed as pending, which means that the instance is still being started. The IBM AMI is large, so it normally takes 5-10 minutes just to start. Running the same command some time later shows that the state is running, and that the external IP address of 184.73.43.141 has been given. The Internal IP address that starts with 10 is useful for talking within the EC2 cloud, but not now.

You can then SSH into the server using the key you generated earlier. But first, you must allow SSH (22/TCP) in. Listing 4 shows how to authorize the connection and log in to your new server.

SSH Keys

If you're not familar with SSH keys, then it's helpful to know that SSH can authenticate users with keys instead of passwords. You generate a keypair, which is composed of a public key and a private key. You keep the private key private, and upload the public key to a file called authorized keys, which is inside the $HOME/.ssh directory. When you connect to a server with SSH, the client can try to authenticate with the key. If that succeeds, you're logged in.

One property of the keypair is that messages encrypted with one of the keys can only be decrypted with the other. When you connect to a server, the server can encrypt a message with the public key stored in the authorized_keys file. If you can decrypt the message using your public key, the server knows you are authorized to log in without a password.

The next logical question is "how does the authorized_keys file get filled in with the public key that's stored with Amazon?" Each EC2 instance can talk to a web server in the EC2 cloud at http://169.254.169.254 and retrieve metadata about the instance. One of the URLs is http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key which returns the public key associated with the image.

On boot, the AMI retrieves the public key and stores it in authorized_keys. This is done in /etc/init.d/getssh in the example AMI. It could just as easily happen in rc.local.

Another use for instance metadata is to pass information to the image. You could have one generic AMI that could be either a web server or a background job server and have the instance decide which services to start based on the parameters you pass when starting the image.

Listing 4. Connecting to the instance

[sean@sergeant:~]$ ec2-authorize default -p 22 -s $MYIP/32
...
[sean@sergeant:~]$ ssh -i ~/.ssh/main.pem root@184.73.43.141
The authenticity of host '184.73.43.141 (184.73.43.141)' can't be established.
RSA key fingerprint is af:c2:1e:93:3c:16:76:6b:c1:be:47:d5:81:82:89:80.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '184.73.43.141' (RSA) to the list of known hosts.
...

The first command allows port 22 (TCP is the default option) from a source of my IP address. The /32 means that only the host is allowed, not the entire network. The ssh command connects to the server using the private key.

Installing Ruby

CentOS comes with a dated Ruby version so you will install Ruby Enterprise Edition (REE), which is a high performance Ruby interpreter that's compatible with the current 1.8.7 branch of Ruby. Despite the expensive sounding name, the software is Open Sourced. Listing 5 shows how to install REE.

Listing 5. Installing REE

 
# rpm -e ruby ruby-libs
# yum -y install gcc-c++ zlib-devel openssl-devel readline-devel
...
Complete!
# wget http://rubyforge.org/frs/download.php/71096/ruby-enterprise-1.8.7-2010.02.tar.gz
...
# tar -xzf ruby-enterprise-1.8.7-2010.02.tar.gz
# ruby-enterprise-1.8.7-2010.02/installer -a /opt/ree

The first two commands from Listing 5 remove the default Ruby installation and install a C compiler and a few necessary development packages. wget downloads the current REE tarball which is then unpacked by tar. Finally, the last command runs the installer with an option to accept all defaults and place the results in /opt/ree. The installer is smart enough to tell you the commands you have to run if you're missing some packages, so look closely at the ouput if the installation isn't working.

After Ruby is installed, add the bin directory to your path with export PATH="/opt/ree/bin:$PATH", which can be placed in the system wide /etc/bashrc or the .bashrc within your home directory.

Installing PostgreSQL

The PostgreSQL server is part of the CentOS distribution, so all that needs to be done is install it with the yum utility. Listing 6 shows how to install PostgreSQL and make sure it will start on boot.

Listing 6. Installing PostgreSQL

# yum -y install postgresql-server postgresql-devel
...
Installed: postgresql-devel.i386 0:8.1.21-1.el5_5.1 postgresql-server.i386 0:8.1.21-1.el5_5.1
|-------10--------20--------30--------40--------50--------60--------70--------80--------9|
|-------- XML error:  The previous line is longer than the max of 90 characters ---------|
Dependency Installed: postgresql.i386 0:8.1.21-1.el5_5.1 postgresql-libs.i386 0:8.1.21-1.el5_5.1
|-------10--------20--------30--------40--------50--------60--------70--------80--------9|
|-------- XML error:  The previous line is longer than the max of 90 characters ---------|
Complete!
# chkconfig postgresql on

The yum command installs packages from a repository, in Listing 7 you are installing the PostgreSQL server component and development libraries. This will automatically pull in the core database utilities and any other needed packages. You will not need the development package yet, but when it comes time to integrate Rails and PostgreSQL, you will need the libraries inside postgresql-devel.

By default the database stores its files in /var/lib/pgsql/data which is part of the root filesystem. You will move this directory to the instance storage on /mnt as shown in Listing 7.

Listing 7. Moving the PostgreSQL data store to /mnt

 
# mv /var/lib/pgsql/data /mnt
# ln -s /mnt/data /var/lib/pgsql/data
# service postgresql start

After entering the commands in Listing 7, Postgres is running out of /mnt.

Next you must enable password logins for the payroll_prod database (which will be created in the next step). By default, Postgres does not use passwords, it uses an internal identification system. Simply add host "payroll_prod" all 127.0.0.1/32 md5 to the top of /var/lib/pgsql/data/pg_hba.conf, and then run su - postgres -c 'pg_ctl reload' to make the change take effect. With this configuration, normal logins to PostgreSQL don't need a password (which is why the reload command didn't need a password) but any access to the payroll database will.

The final step is to set up the Rails database from the command line. Run su - postgres -c psql and follow along in Listing 8.

Listing 8. Creating the user and database

postgres=# create user payroll with password 'secret';
CREATE ROLE
postgres=# create database payroll_prod;
CREATE DATABASE
postgres=# grant all privileges on database payroll_prod to payroll;
GRANT

And with that, your database is created.

Migrating the data

For testing, you should grab a database dump of your production environment from a certain point in time so that you have something to test with. The SmallPayroll application stores data both in the database and the filesystem. The database will be dumped using the pg_dump command that comes with PostgreSQL, and the filesystem data will use rsync. The database will have to be wiped and re-transfered for the migration because of the nature of database dumps, but the filesystem data only needs to transfer new and changed files because rsync can detect when a file hasn't changed. Thus the testing part of the plan helps speed up the migration because most of the data will already be there.

The fastest way to copy the database is to run pg_dump payroll_prod | gzip -c > /tmp/dbbackup.gz on your production machine, copy dbbackup.gz to the cloud server, and then run zcat dbbackup.gz | psql payroll_prod. This simply creates a compressed dump of the database from one server, and then replays all the transactions on the other server.

rsync is just as simple. From your production server, run rsync -avz -e "ssh -i .ssh/main.pem" /var/uploads/ root@174.129.138.83:/var/uploads/. This will copy everything from /var/uploads from the current production server to the new server. If you run it again, only the changed files are copied over, saving you time on later synchronizations.

Since you are copying the database over, you do not have to apply your Rails migrations first. Rails will believe the database is up to date because you already copied over the schema_migrations table.

Deploying the Rails application

At this point you have the base server set up, but not your application. You must install some basic gems, along with any gems your application requires, before your application will run. Listing 9 shows the commands to update your gems. Note that you must be in the root of your Rails application, so copy it over to your server first.

Listing 9. Updating rubygems and installing your gems

# gem update --system
Updating RubyGems
Nothing to update
# gem install rails mongrel mongrel-cluster postgres
Successfully installed rails-2.3.8
Building native extensions.  This could take a while...
Successfully installed gem_plugin-0.2.3
Successfully installed daemons-1.1.0
Successfully installed cgi_multipart_eof_fix-2.5.0
Successfully installed mongrel-1.1.5
Successfully installed mongrel_cluster-1.0.5
Building native extensions.  This could take a while...
Successfully installed postgres-0.7.9.2008.01.28
7 gems installed
...
# rake gems:install
(in /home/payroll)
gem install haml
Successfully installed haml-3.0.12
1 gem installed
Installing ri documentation for haml-3.0.12...
Installing RDoc documentation for haml-3.0.12...
gem install money
...

The first command makes sure that rubygems itself is up to date. The second command installs some helpful gems:

rails: The Ruby on Rails framework.
postgres: The database driver that will let you use PostgreSQL with ActiveRecord.
mongrel: An application server used to host Rails application.
mongrel_cluster: Some utilities to let you start and stop groups of mongrels at the same time.

The last command runs a Rails task to install all the extra gems required by your application. If you didn't use the config.gem directive in your config/environment.rb file, then you may have to install your extra gems by hand using the gem install gemname command.

Try to start your application with the RAILS_ENV=production script/console command. If this succeeds, stop it and then launch your pack of mongrels with mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml. If the first command doesn't succeed you will get enough error messages to find the problem, which is usually a missing gem or file. This is also a good opportunity to go back and put in any missing config.gem directives so you don't forget the gem in the future.

Installing a front end web server

Nginx is the web server of choice for many virtual environments. It has low overhead and is good at proxying connections to a backend service like mongrel. Listing 10 shows how to install nginx.

Listing 11. Installing nginx

# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
...
# yum install nginx
...
Running Transaction
Installing     : nginx                                             [1/1]
Installed: nginx.i386 0:0.6.39-4.el5
Complete!
# chkconfig nginx on

Listing 11 installs the Extra Packages for Enterprise Linux (EPEL) repository, then installs nginx and makes sure it will come up on boot. Listing 12 is a very simple configuration file that can be placed in /etc/nginix/conf.d/rails.conf.

Listing 11. An nginx configuration for a rails application

# Two mongrels, balanced based on least connections
upstream mongrel-payroll {
	fair;
	server 127.0.0.1:8100;
	server 127.0.0.1:8101;
}

server {
	listen 80;
	server_name  app.smallpayroll.ca;

	root   /home/payroll/current/public;
	gzip_static on;

	access_log  /var/log/nginx/app.smallpayroll.ca_log  main;
	error_page  404       /404.html;

	location / {
		# Because we're proxying, set some environment variables indicating this
		proxy_set_header X-Real-IP  $remote_addr;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header Host $http_host;
		proxy_redirect false;
		proxy_max_temp_file_size 0;


		# Serve static files out of Root (eg public)
		if (-f $request_filename) {
			break;
		}

		# Handle page cached actions by looking for the appropriately named file
		if (-f $request_filename.html) {
			rewrite (.*) $1.html;
			break;
		}

		# Send all other requests to mongrel
		if (!-f $request_filename) {
			proxy_pass http://mongrel-payroll;
			break;
		}
	}
	error_page   500 502 503 504  /500.html;
	location = /500.html {
		root   /home/payroll/current/public;
	}
}

Listing 11 shows a fairly typical nginx configuration with some elements thrown in to handle Rails page caching and sending dynamic requests to an upstream mongrel. Other mapping of URLs to filenames could be done here if needed.

With the configuration in place, service nginx start will start the web server.

Testing

For testing, it would be very helpful to be able to refer to your cloud instance using the regular domain name of your application because you want to ensure that you're using your test site and not the production site. This is accomplished by a local DNS override. In Windows, edit c:\windows\system32\drivers\etc\hosts, and in Unix edit /etc/hosts. Add a line like x.x.x.x app.smallpayroll.ca where x.x.x.x is the IP of your cloud server, and app.smallpayroll.ca is the name of your application. Restart your browser, and browse to your website. You will be using the cloud version of your application now. (Don't forget to comment out the line you just added when you want to go back to the production version!)

At this point you should be able to test that the cloud version of your application works just as well as the production version, and to fix any problems you find. Make careful note of whatever you find, as you'll want to script it in case you launch a second server.

Since you're using the cloud version of your application, you can delete and restore your database without any users complaining.

Bundling the new AMI

The last thing you want to do is rebundle your AMI. Any time you start a new instance, you lose everything in /mnt, and your root partition is reset to whatever is in the AMI. There's nothing you can do yet about the /mnt problem, but rebundling makes sure that your AMI is just the way you left it.

If the AMI you are starting from does not have the AMI tools, you can install them with rpm -i --nodeps http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.noarch.rpm.

Bundling an AMI is a 3 step process:

Create the image on the instance itself
Upload the image to S3
Register the AMI

Before proceeding, you should shut down your mongrel and PostgreSQL instances, just to make sure any open files are handled correctly. You must also copy your X.509 keys, found in the Amazon Console, to /mnt on your server. Listing 10 shows the first two steps of bundling, which are done on the virtual machine itself.

Listing 10. Bundling the AMI

# ec2-bundle-vol -d /mnt -e /mnt --privatekey /mnt/pk-mykey.pem  \
--cert /mnt/cert-mycert.pem --user 223110335193 -p centos-ertw
Please specify a value for arch [i386]:
Copying / into the image file /mnt/centos-ertw...
...
Generating digests for each part...
Digests generated.
Creating bundle manifest...
ec2-bundle-vol complete.
# ec2-upload-bundle -b ertw.com -m /mnt/centos-ertw.manifest.xml \
--secret-key MYSECRETKEY --access-key MYACCESSKEY
Creating bucket...
Uploading bundled image parts to the S3 bucket ertw.com ...
...
Uploaded centos-ertw.part.37
Uploading manifest ...
Uploaded manifest.
Bundle upload completed.

The first command generates the bundle, specifying that /mnt is to be ignored, and that the bundle will go in /mnt (the -e and -d options respectively). The -k, --cert, and --user options point to your security credentials and AWS userid, which are all found in the account settings of your AWS console. The last option, -p, lets you name this AMI to differentiate it from others.

The first command will run for about 10 minutes, depending on how full your root partition is.

The second command from Listing 10 uploads the bundle to S3. The -b option specifies a bucket name, which will be created if it doesn't exist. The -m option points to the manifest file created in the last step. The last two options are your S3 credentials, which are found right next to your X.509 credentials in the AWS console. Just remember that X.509 credentials are used for EC2 operations, while S3 uses text keys.

Finally, run ec2-register ertw.com/centos-ertw.manifest.xml to register the AMI, and you will see the AMI identifier to use from now on. Note that the ec2-register command is not distributed with the AMI, so it's easiest to run it from the server from which you started the original AMI. You could also install the EC2 tools on your EC2 instance.

Performing the Migration

Now that you've got your cloud environment running, the migration itself should be rather simple. You've verified everything works, all the remains is to resynchronize the data and cut over in an orderly fashion.

Pre-migration tasks

Some time before the migration, make sure to lower the TTL of your domain name records to 5 minutes. You should also develop a checklist of the steps you will take to move everything over, the tests you want to run to verify everything is working, and the procedure to back out of the change if necessary.

Make sure your users are notified of the migration!

Just before your migration time, take another look at your cloud environment to make sure it is ready to be synchronized and accept production traffic.

Migrating the application

The first task in the migration is to disable the current production site or put it in read-only mode, depending on the nature of the site. Since most of SmallPayroll's requests involve writing to the database or filesystem, the site will be disabled. The Capistrano deployment gem includes a task, cap deploy:web:disable, which puts a maintenance page on the site informing users that the site is down for maintenance.

Stop the application services on the cloud environment in preparation for the data migration by killing your mongrel processes.

Next, copy your database over the same way you did for testing. Re-run the rsync if necessary. Restart the application servers with mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml

Make sure your hosts file is pointing to the cloud environment and perform some smoke tests. Make sure users can log in and browse the site.

Change DNS

If your smoke tests pass, then you can change your DNS records to point to your cloud environment. At this point I find it helpful to keep a tail -f running on the web server's log file to watch for people coming in to the site.

Chances are your local DNS server still has the old information cached for the next 5 minutes. You can verify this with the dig command as shown in Listing 11.

Listing 11. Verifying the DNS server is caching the query

# dig  app.smallpayroll.ca @172.16.0.23

; <<>> DiG 9.3.4 <<>> app.smallpayroll.ca @172.16.0.23
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38838
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 0

;; QUESTION SECTION:
;app.smallpayroll.ca.           IN      A

;; ANSWER SECTION:
app.smallpayroll.ca.    251     IN      A       69.164.205.185
...

In the answer section you can see that there are 251 seconds until the entry expires. It is important that you use a tool like dig, host, or nslookup to verify DNS because your hosts file is overriding DNS for the moment. Using ping would use whatever is inside hosts.

Perform your final acceptance testing while you wait for DNS to propagate.

Summary

You have successfully migrated an application to the cloud! The basic procedure was:

Set up the new environment
Test with a copy of production data
Turn off the old environment
Copy production data over to the new environment
Move DNS to point to the new environment

Despite being "in the cloud", the application is probably worse off than it was before. Consider the following:

The application is still running on one server.
If the server crashes, all the data is lost.
You have less control over performance than you do on a physical server.
The machine and application are not locked down.

In the next article you'll learn how to overcome these problems and start building a more robust environment for your application.

Resources

Learn

In the cloud area on developerWorks, get the resources you need to develop and deploy applications in the cloud and keep on top of recent cloud developments.
Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends.
Request instance metadata from your EC2 instance to get information about the instance, from the SSH keys it should use to user specified information.
Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers.
Start your cloud adventure by looking around the AWS Management Console.
If you're going to work with EC2 then you should familiarize yourself with the various guides that Amazon provides.
Learn about the IBM AMIs from Amazon's perspective and from IBM's perspective.

Get products and technologies

Ruby Enterprise Edition is a high performance Ruby implementation that can be used by itself, or along with Phusion Passenger to integrate with Apache or nginx. Either way, you get access to faster memory management and improved garbage collection.
Sign up for the IBM Industry Application Platform AMI for Development Use to get started with various IBM products in the cloud. Remember you have to go through a checkout process but you're not going to be charged anything until you use it. You can also use ami-90ed0ff9 which is only IBM WAS.
The Amazon EC2 API tools are used to communicate with the Amazon API to launch and terminate instances, and rebundle new ones. These tools are periodically updated as new features are introduced to EC2, so it's worth checking back after product announcements for updates to this page. You will need at least the 2009-05-15 update, because you'll be using some of the load balancing features later.
Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Participate in the discussion forum.
Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

Sean Walberg is a network engineer and the author of two books on networking. He has worked in several different verticals from healthcare to media.

Comments

Trademarks

static.content.url=http://www.ibm.com/developerworks/js/artrating/

SITE_ID=1

Zone=Linux, Sample IT projects

ArticleID=12345

ArticleTitle=Migrate your Linux application to the Amazon cloud, Part 1: Initial Migration

publish-date=03262010

Introduction
The application
Testing and migration strategies
An Amazon EC2 primer
Preparing the Cloud Environment
Performing the Migration
Summary
Resources
About the author
Comments

Migrate your Linux application to the Amazon cloud, Part 1: Initial Migration

SSH Keys

Table of contents