Migrate your Linux application to the Amazon cloud, Part 2: Initial Migration

Improving application reliability

Sean Walberg, P. Eng (sean@ertw.com), Network Engineer, Independent author

Sean Walberg is a network engineer and the author of two books on networking. He has worked in several different verticals from healthcare to media.

Summary: Your article abstract goes here. Put the main points and key phrases at the beginning of the abstract, because it may be truncated in search results. Include symbols for any trademarked terms, such as Java™ or WebSphere®, as well as any HTML tagging needed for highlighting or linking to related developerWorks content. If this article is part of a series, the stylesheet creates a link to the entire series.

View more content in this series

Tag this!

Date: 26 Mar 2010
Level: Intermediate
PDF: A4 and Letter (56KB | 8 pages)Get Adobe® Reader®
Comments:

The first article in the series followed the migration of a single physical server to a single physical cloud server. Despite all the work involved, the application was no better off, largely because of more single points of failure being introduced.

Even on a single physical server you can get redundant power supplies, error correcting RAM, redundant disks, and copious monitoring of pre-fault indicators. On a cloud server, you don't know what you've got, or more to the point, what you have access to. The cloud servers are generally reliable, but it would be smart to take extra precautions, especially since Amazon provides extra services to enhance reliability.

When deploying to a cloud environment it is wise to assume that you could lose a virtual instance at any point. This is not to say that cloud services are unreliable; only that the types of failures you might run into are unlike what you're used to in a physical environment. Therefore you should push intelligence into your application to deal with communications loss and to scale across multiple servers. This type of thinking will build a better application no matter what type of environment you are building for.

In this article you will learn how to improve on the database's ephemeral storage using Elastic Block Stores. You'll further improve your data protection by setting up backups. You'll protect against application server loss by load balancing across multiple instances, and to recover from various failures.

Figure 1 shows the current architecture of the application.

Figure 1. The current architecture

Everything is on one EC2 instance. The front end web server, nginx, proxies requests to multiple mongrel instances or serves static files itself. The mongrels access a PostgreSQL database on the same host.

Configuring persistent disks

Instance storage is the biggest difference between EC2 and virtualization technologies like VMWare ® and Xen ®. Recall that an EC2 instance gives you a fixed 10GB root partition and an instance disk that's sized depending on the type of instance launched. The root partition is cloned from the AMI on boot and the instance store is empty. When you shut down your server your instance store is lost.

Amazon's initial position was to tell people to back up their servers frequently to the Simple Storage Service (S3). If your server crashed then you would have other servers pick up the load, or you'd get your data from S3. Eventually Amazon came out with the Elastic Block Storage (EBS) which is a service that provides persistent disks. If your server crashes then you can reattach the EBS volume to another server. Amazon even built a snapshot mechanism to ease backups.

The prime problem with the database server in the SmallPayroll application is that it is a single point of failure. There are two general approaches to fixing this; one is to build two database servers that can take over for each other, the other approach is to reduce the potential downtime to something reasonable. The first option has the least downtime but is complex. The second option is much more practical in this situation. If the database server should crash, a new instance will be started to replace it. EBS takes care of keeping the data safe. Total time to launch a new database server and repoint the clients should be under 10 minutes from when the fault is noticed.

As a final bonus, EBS storage has a higher IO capacity than does the instance storage.

Is EBS reliable?

While an EBS volume continues to exist even when the instance it is attached to is turned off, it is not a 100% reliable solution. An EBS volume is replicated within a single availability zone (see Setting up EBS for the first time) to protect against failure, but it is not replicated further than that. Additionally, failures can and do happen.

Amazon says the failure rate of EBS depends on how big the volume is, and how often it changes. Typically, says Amazon, an EBS volume is 10 times more reliable than a physical disk, which puts EBS at roughly the same reliability as a RAID 1 mirror.

Fortunately the EBS API provides a mechanism to snapshot the data to the S3 service. This allows you to take a fast backup of your volume and store it in S3, where the data is replicated across at least 3 facilities.

The steps to using EBS are as follows:

Create the volume with the ec2-create-volume command.
Attach the volume to a running instance with the ec2-attach-volume command.
Create a filesystem on the volume.
Mount the filesystem to a directory.

Setting up EBS for the first time

The first step to setting up EBS is to tell Amazon you want to create a volume. You need to know two things: The size of your image (in Gigabytes) and the availability zone you want to use the image in. The availability zone is something that Amazon came up with to describe the location of the server. Zones starting with us-east are in Northern Virginia, and are collectively called the region. There are three such zones in the us-east region at this time called us-east-1a, us-east-1b, and us-east-1c. Each availability zone is designed to be isolated from failures in other availability zones. Zones in the same region are still close to each other and therefore have low latency between them.

One restriction of EBS is that the volume can only be mounted in the availability zone in which it was created. There are ways to move it, but you must create your volumes in the same availability zone as your server.

Run ec2-create-volume -s 20 -z us-east-1a to create a 20GB volume in the us-east-1a zone. If you don't know where your server is, the ec2-describe-instances command will tell you. You can use the same -z parameter to ec2-run-instance to specify where your server is to be launched. Listing 1 shows this command and its output.

Listing 1. Creating the EBS volume

$ ec2-create-volume -s 20 -z us-east-1a
VOLUME  vol-c8791ca1    20              us-east-1a      creating     
2010-07-01T02:52:52+0000

The output of Listing 1 shows that the volume is being created, and that the volume ID is vol-c8791ca1. Knowing this, you can attach the volume to a running EC2 instance if you know the instance identifier of the server, and the device you want the server to see the volume as. Run ec2-attach-volume vol-c8701ca1 -i i-fd15e097 -d /dev/sdj to attach this newly created volume to the service instance i-fd15e097. Remember you can find your instance identifier through the ec2-describe-instances command, and see the list of volumes with ec2-describe-volumes.

Your virtual server now has a disk called /dev/sdj that looks to it like a normal hard drive. As with any disk, you need to create a filesystem on the raw disk. You have several choices depending on your needs:

Create a standard ext3 filesystem. This is simple.
Create an XFS filesystem. This will allow you to freeze the filesystem to take a snapshot for backup purposes.
Layer the Logical Volume Manager (LVM) between the disk and the filesystem. This would allow you to extend the EBS volume at a later time.
Use Linux software RAID to stripe multiple EBS volumes, and put either XFS or ext3 on top of the RAID set. This will give even higher disk performance.

Even though RAID and LVM provide interesting features, XFS is the simplest option for a relatively small EBS volume. You will be able to use the freezing features of XFS along with the EBS snapshots to take consistent backups. Listing 2 shows how to create an XFS filesystem and mount it on the host.

Listing 2. Creating and mounting the XFS filesystem

# mkfs.xfs /dev/sdj
meta-data=/dev/sdj               isize=256    agcount=8, agsize=32768 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mkdir /ebsvol
# mount /dev/sdj /ebsvol

Listing 2 runs the mkfs.xfs command to format /dev/sdj (Run gem install -y xfsprogs if you do not have the mkfs.xfs command.) The output of this command describes the parameters of the filesystem. As long as there are no errors in the output, it can be ignored.

The last two commands in Listing 2 create a mount point called /ebsvol and then mount the filesystem on the mount point.

The filesystem is now usable. Any files under /ebsvol will persist even when the server is down.

Making use of the EBS volume

You have an EBS volume mounted on /ebsvol and need to move the PostgreSQL data over. The most direct way is to copy over the existing datastore and fix things up with a symlink. While this would work, a cleaner option is to clone the data from the EBS volume to /var/lib/pgsql. Listing 3 shows this procedure.

Listing 3. Moving the PostgreSQL data to the EBS volume

# service postgresql stop
# mv /var/lib/pgsql /ebsvol
# mkdir /var/lib/pgsql
# chown postgres:postgres /var/lib/pgsql
# mount /ebsvol/pgsql /var/lib/pgsql -o bind
# service postgresql start

The sequence of commands in Listing 3 is as follows:

Stop the PostgreSQL daemon to ensure data consistency.
Move the whole directory tree to the EBS store.
Recreate the PostgreSQL directory.
Reset the ownership of the PostgreSQL directory.
Mount /ebsvol/pgsql on top of /var/lib/pgsql using mount's bind option.
Restart the database.

The bind option to mount clones the first directory onto the second. Changes to one appear on the other, after all it's the same blocks on the same disk. Using bind differs from mounting the same device twice in that you can mount subdirectories instead of the whole filesystem.

Recovering from a crash

If your server crashes, all you have to do is:

Start a new instance using your AMI.
Attach the EBS volume to the instance with ec2-attach-volume.
Mount the EBS device on /ebsvol
Perform the last 4 commands from Listing 3.

As long as you've bundled your AMI recently, your database system will be up to date.

Addressing the application server

One of the benefits that cloud computing promotes is that you have easy access to server capacity. Right now the SmallPayroll.ca environment has both the database and application server on the same virtual instance, which is the same as prior to the migration to Amazon EC2. The next step will be to separate the application server from the database server.

Scaling and Load Balancing

The term scaling is generally associated with capacity. If an application is said to scale, then the application can be grown to handle the load of more users. If this scaling is performed by adding more servers, it's said to scale horizontally. If you're replacing a server with a larger server to handle the load, then the application scales vertically.

Horizontal and vertical scaling can be used in combination with each other. It may be easier to outrun database capacity problems with bigger servers and faster disks, and to spread computations across more servers. Being able to scale horizontally or vertically is mostly a function of application design. Some applications can not be spread across multiple computers, and some operations take a certain amount of time no matter how fast the computer is. Furthermore, some applications may scale horizontally to a certain point at which a bottleneck drives the marginal gain of adding a server to nothing.

When one spreads out the application across multiple servers, the problem arises of how to distribute incoming requests. The device most often used to do this is a load balancer, which is an appliance that accepts requests from the outside world and hands it off to the next available application server. Because this is not an intensive task, a single device can handle a large number of connections, or this function can be handled in software.

The Amazon EC2 service provides a cloud load balancer, called the Elastic Load Balancer (ELB) that is adequate for most purposes. It distributes requests, can be reconfigured to add or remove servers through an API, and will perform health checks on the backend servers.

An alternative to using ELB would be to run your own load balancing software on an EC2 instance, such as HAProxy or Varnish. This would be more complex than using ELB, but would provide a higher level of control over your own traffic. ELB is more than adequate for an application like SmallPayroll.ca.

Figure 2 shows the new design of the SmallPayroll.ca application.

Figure 2. The SmallPayroll.ca application with separate application servers

Incoming requests land on the ELB and are sent to one of two servers. The servers themselves run nginx which handles static requests and proxies any dynamic requests to mongrel instances. The mongrels attach to a single database server.

In the event that one of the application servers become incapacitated the ELB will redirect all traffic to the other server.

Launching the new instances

To build the separate application servers you will need to launch two more instances. You can use the same AMI as before because it should have all the necessary software. It is possible to launch more than one instance at a time. Listing 4 shows two instances being launched with one command.

Listing 4. Launching two instances at once

$ ec2-run-instances ami-147f977d -k main -z us-east-1a \
-d 'role=web,db=10.201.207.180' -n 2
RESERVATION     r-9cc240f7      223410055806    default
INSTANCE        i-81ee2eeb      ami-147f977d    pending main ...
INSTANCE        i-87ee2eed      ami-147f977d    pending main ...

The ec2-run-instances command is similar to those used in the past. The availability zone is chosen with -z us-east-1a because the database server is in the same region. At the moment, it is desirable to keep the database server and application servers in the same availability zones to reduce latency and reduce bandwidth charges.

However, the -d and -n parameters are new. -n 2 simply tells Amazon to launch two instances, which is confirmed in the output. -d allows you to pass information to the instance. Listing 5, taken from the new instance, shows how to retrieve this information.

Listing 5. Retrieving instance metadata

[root@domU-12-31-39-0C-C5-B2 ~]# DATA=`curl -s http://169.254.169.254/latest/user-data`
[root@domU-12-31-39-0C-C5-B2 ~]# echo $DATA
role=web,db=10.201.207.180

The curl command retrieves a web page from the EC2 services containing the user data. This is similar to how the server retrieved its SSH key in the previous article in this series.

Configuring the application servers

There is not much to do on the application servers, because the AMI they were cloned from already is capable of running the application against a local database. A Rails application reads its database configuration from config/database.yml which tells the application which database server to use. By default the application will connect to the localhost.

First, create an DNS alias by adding an entry to /etc/hosts. For example, 10.201.207.180 dbserver aliases the name dbserver to the address 10.201.207.180. It is important to use the private address of the database, which is the address assigned to eth0, instead of the public address you connect to. Traffic between the private addresses of EC2 instances in the same availability zone is free, while traffic from once EC2 instance to the public address of another instance is billable.

Next, add your database.yml to point your application to the DNS alias you created previously. Listing 6 shows such a configuration.

Listing 6. Pointing the application to the database server

production:
  adapter: postgresql
  encoding: utf8
  database: payroll_prod
  pool: 5
  username: payroll
  password:  secret
  host: dbserver

You should be able to launch your Rails application and connect to it over the public IP address of the application server. If you get an error, check the following:

Is PostgreSQL listening on all interfaces? postgresql.conf must have a line like listen_addresses="*"
Does pg_hba.conf allow 10/8 addresses to connect using MD5 authentication?
Does your Amazon security group allow the connection to the database server?

Creating a load balancer

The Elastic Load Balancer is a fairly simple load balancer. Incoming requests come in to the load balancer and they are directed to an available server in the pool. The ELB can do some basic health checking of the web servers to avoid sending requests to servers that are down. ELB also has some basic affinity mechanisms which let you keep users on the same backend servers. More advanced features, such as redirecting based on the URL, are not currently supported.

Session Affinity

Session affinity, sometimes called session persistence, refers to a load balancer feature whereby the balancer keeps track of which client was talking to which server. The next time that client makes a request, the request is redirected to the same backend web server. This allows the application to keep items in local RAM or disk, rather than needing to share the information across all members of the pool.

Symptoms of session affinity gone wrong are users being randomly logged out of applications, or inputs appearing as if they've gone missing. In these cases, the server taking the request is missing information, such as session data, because it's sitting on another computer.

You can require affinity, or you can work your way around the problem. A distributed cache, like memcached, doesn't care which server it's accessed from. You can store session data in the database or in a distributed cache. You could also choose to store information within a client cookie, depending on your security needs.

Configuring an ELB is a thee step process.

Create the LB instance.
Define your health checks.
Configure DNS to point to the ELB name

Listing 7 shows the first two steps in action.

Listing 7. Configuring an elastic load balancer

$ elb-create-lb smallpayroll-http \
	--listener "lb-port=80,instance-port=80,protocol=HTTP" \
	--availability-zones us-east-1a
DNS_NAME  DNS_NAME
DNS_NAME  smallpayroll-http-706414765.us-east-1.elb.amazonaws.com
$ elb-configure-healthcheck smallpayroll-http --target "HTTP:80/" \
     --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2
HEALTH_CHECK  TARGET    INTERVAL  TIMEOUT  HEALTHY_THRESHOLD  UNHEALTHY_THRESHOLD
HEALTH_CHECK  HTTP:80/  30        3        2                  2

Listing 7 shows two commands. The first command, elb-create-lb, creates the load balancer. The first parameter is the name of the load balancer, which is unique only to you. The --listener parameter dictates that the public facing port is 80, and that it is also to be connected to port 80 on the instance, and that the protocol in use is HTTP. The output of this command is a DNS name, in this case, smallpayroll-http-706414765.us-east-1.elb.amazonaws.com. Unlike most load balancers, you are not given a public IP address to connect to. Amazon assigns its own IP addresses and you connect through a DNS alias.

The second command, elb-configure-healthcheck, first makes reference to the name of the load balancer, and then specifies that the health check will be performed with the HTTP protocol on port 80, using the root URL. It is also possible to write a separate controller and action to handle the checks, such as /status, but in this case, the root URL provides enough assurance the application is running properly.

The second line of parameters specify, in order, the following:

--interval 30: Test every 30 seconds.
--timeout 3: How long to wait for a response before failing the test.
--unhealthy-threshold 2: Two consecutive failed tests will mark the server as out of service.
--healthy-threshold 2: A failed service will require two consecutive successful checks before the server is brought back into the pool.

The next step is to attach instances to the load balancer. You can add and remove instances at will.

Listing 8. Adding two instances to the load balancer

$ elb-register-instances-with-lb smallpayroll-http --instances i-87f232ed,i-85f232ef
INSTANCE_ID  INSTANCE_ID
INSTANCE_ID  i-85f232ef
INSTANCE_ID  i-87f232ed
$ elb-describe-instance-health smallpayroll-http --headers
INSTANCE_ID  INSTANCE_ID  STATE      DESCRIPTION  REASON-CODE
INSTANCE_ID  i-85f232ef 	InService  N/A					N/A
INSTANCE_ID  i-87f232ed		InService  N/A					N/A

Listing 8 first shows two instances being added to the smallpayroll-http load balancer. Run the elb-describe-instance-health command to see the status of each server in the pool. InService means the service is able to handle requests through the load balancer.

Finally, browse to the DNS name of your load balancer, and you should see the application working across two servers. To make the load balancer work for the real DNS name of your application, you will have to change your application's DNS record from an A record to a CNAME pointing at the DNS name of the load balancer. See the resources for some more details about the DNS requirements, including some caveats. While the DNS method is cumbersome, it allows you to handle orders of magnitude more requests than by building a load balancer on an EC2 instance.

The DNS change can happen at any time because there would be no disruption of service.

Backing up data

The application is now spread across two nodes and the database server can be started from scratch in less then half an hour. This is good for availability, but doesn't help if an administrator accidentally destroys critical data, or if the EBS volume fails. Fortunately there are solutions to address these problems.

Backing up the database

EBS provides a snapshot feature which stores a copy of the volume in S3. To be more correct, an EBS snapshot stores the differences since the last snapshot. A database complicates the matter because it caches some disk writes, which may result in an inconsistent snapshot. Therefore you must make sure that everything is on disk in a consistent state. The order of the backups will be:

Tell PostgreSQL to enter backup mode.
Freeze the filesystem.
Request a snapshot from Amazon.
Unfreeze the filesystem.
Tell PostgreSQL that the backup is over.

Even though this procedure may take a minute or two, Amazon is spooling the snapshot to S3 in the background. Changes made after step 3 will not be reflected in the snapshot, however.

Listing 9. Backing up the database

#!/bin/sh
export EC2_HOME=/usr/local/
export JAVA_HOME=/usr
export EC2_CERT="/root/.ec2/cert.pem"
export EC2_PRIVATE_KEY="/root/.ec2/pk.pem"
echo "select pg_start_backup('snapshot')" | su - postgres -c psql
/usr/sbin/xfs_freeze -f /ebsvol/
/usr/local/bin/ec2-create-snapshot vol-93f77ffa --description "`date`"
/usr/sbin/xfs_freeze -u /ebsvol/
echo "select pg_stop_backup('snapshot')" | su - postgres -c psql

You can verify the status of the snapshot with the ec2-describe-snapshots command, just like Listing 10.

Listing 10. Showing the EBS snapshots (condensed)

$ ec2-describe-snapshots --headers
                SnapshotId      VolumeId        Status          StartTime
SNAPSHOT        snap-298cb741   vol-93f77ffa    completed       2010-06-29T02:50:55
SNAPSHOT        snap-a2b959c9   vol-93f77ffa    completed       2010-07-13T15:14:54

Listing 10 shows two completed snapshots, along with their times.

You should automate the creation of snapshots by running Listing 9 from cron. You should also periodically prune your list of snapshots with the ec2-delete-snapshot command.

Restoring the database

If your EBS volume fails, or you need to restore old data from EBS, you will need to restore from your last snapshot. The procedure to restore an EBS volume is almost identical to creating a new one. Listing 11 shows how to restore the last snapshot from Listing 10.

Listing 11. Restoring an EBS snapshot

$ ec2-create-volume --snapshot snap-a2b959c9 -z us-east-1a -s 20
VOLUME  vol-d06b06b9    20      snap-a2b959c9   us-east-1a      creating

You may then mount this volume on any instance to restore your data.

Backing up and restoring files

A simple way to backup files from your servers is to copy them into S3, or make them part of your stock AMI. The latter is more effective for binaries and software packages, while copying to S3 is more for user data. The S3Sync tool provides some command line S3 tools, along with a handy rsync-like utility.

Download the S3Sync utilities (see the resources section for the link). Listing 12 shows how to create a bucket for backups, and how to upload files to S3.

Listing 12. Backing up your data to S3

$ s3cmd.rb  createbucket smallpayroll-backup
$ s3cmd.rb listbuckets
ertw.com
smallpayroll-backup
$ s3sync.rb  -r /var/uploads smallpayroll-backup:`date +%Y-%m-%d`
$ s3cmd.rb list smallpayroll-backup:2010-07-12
--------------------
2010-07-12/uploads
2010-07-12/uploads/file1.txt

Listing 12 starts by creating a bucket called smallpayroll-backup. You may safely store different backups from different times in the same bucket, so this step only need be done once. The second command verifies that the bucket was created, you can see the bucket that was just created and the ertw.com bucket where the AMIs are.

The s3sync.rb command recursively copies the /var/uploads directory into the backup bucket, prefixing all the files with the current date. The final command shows all the files inside that bucket.

Restoring files is just as simple. You can either s3sync with the parameters reserved, or retrieve an individual file through another tool like S3 file manager.

Summary

The SmallPayroll application is running in the cloud, and is better designed for future growth. Even though the mean time between failures of the hardware hasn't changed, the backups and scripts put in place mean that the data is safe and that the environment can be quickly rebuilt if needed.

Most of the original shortcomings of a straight migration to the cloud have been addressed. There is little visibility into the health of the environment, however, and it would be helpful to be able to scale server resources in response to demand. These issues will be addressed in the next two articles.

Resources

Learn

The RightScale guys have given an in depth look at EBS, which gives more information about snapshots and availability zones.
You can attach multiple EBS volumes to one server. The MySQL performance blog has an excellent comparison of using RAID to increase EBS performance that is relevant to other databases, too.
PostgreSQL online backups are invaluable, but they take some understanding to ensure consistency.
An EC2 instance has several pieces of instance metadata that can help the instance learn about the environment. Browse through this chapter of the EC2 documentation and get some ideas of what can be done.
If you're trying to use the ELB to balance the root of your domain, such as example.com (also called the root apex, this discussion of ELB and CNAMEs to the root apex will highlight the problems and some workarounds.

Get products and technologies

Now that you've got multiple AMIs inside S3, you might want to prune some old ones. S3 File Manager is a web based file manager that rivals the features of many standalone applications or browser plugins. If you delete an AMI, don't forget to ec2-deregister it.
S3Sync is a helpful tool for copying files too and from S3, and manipulating your buckets.
The S3 File Manager is better than anything else out there for navigating your S3 buckets, and it doesn't even involve installing software. With a good browser, you can even drag and drop files from your desktop to S3.

Discuss

Participate in the discussion forum.
Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

About the author

Sean Walberg is a network engineer and the author of two books on networking. He has worked in several different verticals from healthcare to media.

Comments

Trademarks

static.content.url=http://www.ibm.com/developerworks/js/artrating/

SITE_ID=1

Zone=Linux, Sample IT projects

ArticleID=12345

ArticleTitle=Migrate your Linux application to the Amazon cloud, Part 2: Initial Migration

publish-date=03262010