The first article in the series followed the migration of a single physical server to a single physical cloud server. Despite all the work involved, the application was no better off, largely because of more single points of failure being introduced.
Even on a single physical server you can get redundant power supplies, error correcting RAM, redundant disks, and copious monitoring of pre-fault indicators. On a cloud server, you don't know what you've got, or more to the point, what you have access to. The cloud servers are generally reliable, but it would be smart to take extra precautions, especially since Amazon provides extra services to enhance reliability.
When deploying to a cloud environment it is wise to assume that you could lose a virtual instance at any point. This is not to say that cloud services are unreliable; only that the types of failures you might run into are unlike what you're used to in a physical environment. Therefore you should push intelligence into your application to deal with communications loss and to scale across multiple servers. This type of thinking will build a better application no matter what type of environment you are building for.
In this article you will learn how to improve on the database's ephemeral storage using Elastic Block Stores. You'll further improve your data protection by setting up backups. You'll protect against application server loss by load balancing across multiple instances, and to recover from various failures.
Figure 1 shows the current architecture of the application.
Figure 1. The current architecture
Everything is on one EC2 instance. The front end web server, nginx, proxies requests to multiple mongrel instances or serves static files itself. The mongrels access a PostgreSQL database on the same host.
Instance storage is the biggest difference between EC2 and virtualization technologies like VMWare ® and Xen ®. Recall that an EC2 instance gives you a fixed 10GB root partition and an instance disk that's sized depending on the type of instance launched. The root partition is cloned from the AMI on boot and the instance store is empty. When you shut down your server your instance store is lost.
Amazon's initial position was to tell people to back up their servers frequently to the Simple Storage Service (S3). If your server crashed then you would have other servers pick up the load, or you'd get your data from S3. Eventually Amazon came out with the Elastic Block Storage (EBS) which is a service that provides persistent disks. If your server crashes then you can reattach the EBS volume to another server. Amazon even built a snapshot mechanism to ease backups.
The prime problem with the database server in the SmallPayroll application is that it is a single point of failure. There are two general approaches to fixing this; one is to build two database servers that can take over for each other, the other approach is to reduce the potential downtime to something reasonable. The first option has the least downtime but is complex. The second option is much more practical in this situation. If the database server should crash, a new instance will be started to replace it. EBS takes care of keeping the data safe. Total time to launch a new database server and repoint the clients should be under 10 minutes from when the fault is noticed.
As a final bonus, EBS storage has a higher IO capacity than does the instance storage.
The steps to using EBS are as follows:
- Create the volume with the
ec2-create-volume
command. - Attach the volume to a running instance with the
ec2-attach-volume
command. - Create a filesystem on the volume.
- Mount the filesystem to a directory.
Setting up EBS for the first time
The first step to setting up EBS is to tell Amazon you want to create a volume. You need to know two things: The size of your image (in Gigabytes) and the availability zone you want to use the image in. The availability zone is something that Amazon came up with to describe the location of the server. Zones starting with us-east are in Northern Virginia, and are collectively called the region. There are three such zones in the us-east region at this time called us-east-1a, us-east-1b, and us-east-1c. Each availability zone is designed to be isolated from failures in other availability zones. Zones in the same region are still close to each other and therefore have low latency between them.
One restriction of EBS is that the volume can only be mounted in the availability zone in which it was created. There are ways to move it, but you must create your volumes in the same availability zone as your server.
Run ec2-create-volume -s 20 -z us-east-1a
to create a 20GB volume in the us-east-1a zone. If you don't know where your server is, the ec2-describe-instances
command will tell you. You can use the same -z parameter to ec2-run-instance
to specify where your server is to be launched. Listing 1 shows this command and its output.
Listing 1. Creating the EBS volume
$ ec2-create-volume -s 20 -z us-east-1a VOLUME vol-c8791ca1 20 us-east-1a creating 2010-07-01T02:52:52+0000 |
The output of Listing 1 shows that the volume is being created, and that the volume ID is vol-c8791ca1. Knowing this, you can attach the volume to a running EC2 instance if you know the instance identifier of the server, and the device you want the server to see the volume as. Run ec2-attach-volume vol-c8701ca1 -i i-fd15e097 -d /dev/sdj
to attach this newly created volume to the service instance i-fd15e097. Remember you can find your instance identifier through the ec2-describe-instances
command, and see the list of volumes with ec2-describe-volumes
.
Your virtual server now has a disk called /dev/sdj that looks to it like a normal hard drive. As with any disk, you need to create a filesystem on the raw disk. You have several choices depending on your needs:
- Create a standard ext3 filesystem. This is simple.
- Create an XFS filesystem. This will allow you to freeze the filesystem to take a snapshot for backup purposes.
- Layer the Logical Volume Manager (LVM) between the disk and the filesystem. This would allow you to extend the EBS volume at a later time.
- Use Linux software RAID to stripe multiple EBS volumes, and put either XFS or ext3 on top of the RAID set. This will give even higher disk performance.
Even though RAID and LVM provide interesting features, XFS is the simplest option for a relatively small EBS volume. You will be able to use the freezing features of XFS along with the EBS snapshots to take consistent backups. Listing 2 shows how to create an XFS filesystem and mount it on the host.
Listing 2. Creating and mounting the XFS filesystem
# mkfs.xfs /dev/sdj meta-data=/dev/sdj isize=256 agcount=8, agsize=32768 blks = sectsz=512 attr=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=2560, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 # mkdir /ebsvol # mount /dev/sdj /ebsvol |
Listing 2 runs the mkfs.xfs
command to format /dev/sdj (Run gem install -y xfsprogs
if you do not have the mkfs.xfs
command.) The output of this command describes the parameters of the filesystem. As long as there are no errors in the output, it can be ignored.
The last two commands in Listing 2 create a mount point called /ebsvol and then mount the filesystem on the mount point.
The filesystem is now usable. Any files under /ebsvol will persist even when the server is down.
You have an EBS volume mounted on /ebsvol and need to move the PostgreSQL data over. The most direct way is to copy over the existing datastore and fix things up with a symlink. While this would work, a cleaner option is to clone the data from the EBS volume to /var/lib/pgsql. Listing 3 shows this procedure.
Listing 3. Moving the PostgreSQL data to the EBS volume
# service postgresql stop # mv /var/lib/pgsql /ebsvol # mkdir /var/lib/pgsql # chown postgres:postgres /var/lib/pgsql # mount /ebsvol/pgsql /var/lib/pgsql -o bind # service postgresql start |
The sequence of commands in Listing 3 is as follows:
- Stop the PostgreSQL daemon to ensure data consistency.
- Move the whole directory tree to the EBS store.
- Recreate the PostgreSQL directory.
- Reset the ownership of the PostgreSQL directory.
- Mount /ebsvol/pgsql on top of /var/lib/pgsql using
mount
's bind option. - Restart the database.
The bind option to mount clones the first directory onto the second. Changes to one appear on the other, after all it's the same blocks on the same disk. Using bind differs from mounting the same device twice in that you can mount subdirectories instead of the whole filesystem.
If your server crashes, all you have to do is:
- Start a new instance using your AMI.
- Attach the EBS volume to the instance with
ec2-attach-volume
. - Mount the EBS device on /ebsvol
- Perform the last 4 commands from Listing 3.
As long as you've bundled your AMI recently, your database system will be up to date.
Addressing the application server
One of the benefits that cloud computing promotes is that you have easy access to server capacity. Right now the SmallPayroll.ca environment has both the database and application server on the same virtual instance, which is the same as prior to the migration to Amazon EC2. The next step will be to separate the application server from the database server.
The term scaling is generally associated with capacity. If an application is said to scale, then the application can be grown to handle the load of more users. If this scaling is performed by adding more servers, it's said to scale horizontally. If you're replacing a server with a larger server to handle the load, then the application scales vertically.
Horizontal and vertical scaling can be used in combination with each other. It may be easier to outrun database capacity problems with bigger servers and faster disks, and to spread computations across more servers. Being able to scale horizontally or vertically is mostly a function of application design. Some applications can not be spread across multiple computers, and some operations take a certain amount of time no matter how fast the computer is. Furthermore, some applications may scale horizontally to a certain point at which a bottleneck drives the marginal gain of adding a server to nothing.
When one spreads out the application across multiple servers, the problem arises of how to distribute incoming requests. The device most often used to do this is a load balancer, which is an appliance that accepts requests from the outside world and hands it off to the next available application server. Because this is not an intensive task, a single device can handle a large number of connections, or this function can be handled in software.
The Amazon EC2 service provides a cloud load balancer, called the Elastic Load Balancer (ELB) that is adequate for most purposes. It distributes requests, can be reconfigured to add or remove servers through an API, and will perform health checks on the backend servers.
An alternative to using ELB would be to run your own load balancing software on an EC2 instance, such as HAProxy or Varnish. This would be more complex than using ELB, but would provide a higher level of control over your own traffic. ELB is more than adequate for an application like SmallPayroll.ca.
Figure 2 shows the new design of the SmallPayroll.ca application.
Figure 2. The SmallPayroll.ca application with separate application servers
Incoming requests land on the ELB and are sent to one of two servers. The servers themselves run nginx which handles static requests and proxies any dynamic requests to mongrel instances. The mongrels attach to a single database server.
In the event that one of the application servers become incapacitated the ELB will redirect all traffic to the other server.
To build the separate application servers you will need to launch two more instances. You can use the same AMI as before because it should have all the necessary software. It is possible to launch more than one instance at a time. Listing 4 shows two instances being launched with one command.
Listing 4. Launching two instances at once
$ ec2-run-instances ami-147f977d -k main -z us-east-1a \ -d 'role=web,db=10.201.207.180' -n 2 RESERVATION r-9cc240f7 223410055806 default INSTANCE i-81ee2eeb ami-147f977d pending main ... INSTANCE i-87ee2eed ami-147f977d pending main ... |
The ec2-run-instances
command is similar to those used in the past. The availability zone is chosen with -z us-east-1a because the database server is in the same region. At the moment, it is desirable to keep the database server and application servers in the same availability zones to reduce latency and reduce bandwidth charges.
However, the -d and -n parameters are new. -n 2 simply tells Amazon to launch two instances, which is confirmed in the output. -d allows you to pass information to the instance. Listing 5, taken from the new instance, shows how to retrieve this information.
Listing 5. Retrieving instance metadata
[root@domU-12-31-39-0C-C5-B2 ~]# DATA=`curl -s http://169.254.169.254/latest/user-data` [root@domU-12-31-39-0C-C5-B2 ~]# echo $DATA role=web,db=10.201.207.180 |
The curl
command retrieves a web page from the EC2 services containing the user data. This is similar to how the server retrieved its SSH key in the previous article in this series.
Configuring the application servers
There is not much to do on the application servers, because the AMI they were cloned from already is capable of running the application against a local database. A Rails application reads its database configuration from config/database.yml which tells the application which database server to use. By default the application will connect to the localhost.
First, create an DNS alias by adding an entry to /etc/hosts. For example, 10.201.207.180 dbserver
aliases the name dbserver to the address 10.201.207.180. It is important to use the private address of the database, which is the address assigned to eth0, instead of the public address you connect to. Traffic between the private addresses of EC2 instances in the same availability zone is free, while traffic from once EC2 instance to the public address of another instance is billable.
Next, add your database.yml to point your application to the DNS alias you created previously. Listing 6 shows such a configuration.
Listing 6. Pointing the application to the database server
production: adapter: postgresql encoding: utf8 database: payroll_prod pool: 5 username: payroll password: secret host: dbserver |
You should be able to launch your Rails application and connect to it over the public IP address of the application server. If you get an error, check the following:
- Is PostgreSQL listening on all interfaces? postgresql.conf must have a line like listen_addresses="*"
- Does pg_hba.conf allow 10/8 addresses to connect using MD5 authentication?
- Does your Amazon security group allow the connection to the database server?
The Elastic Load Balancer is a fairly simple load balancer. Incoming requests come in to the load balancer and they are directed to an available server in the pool. The ELB can do some basic health checking of the web servers to avoid sending requests to servers that are down. ELB also has some basic affinity mechanisms which let you keep users on the same backend servers. More advanced features, such as redirecting based on the URL, are not currently supported.
Configuring an ELB is a thee step process.
- Create the LB instance.
- Define your health checks.
- Configure DNS to point to the ELB name
Listing 7 shows the first two steps in action.
Listing 7. Configuring an elastic load balancer
$ elb-create-lb smallpayroll-http \ --listener "lb-port=80,instance-port=80,protocol=HTTP" \ --availability-zones us-east-1a DNS_NAME DNS_NAME DNS_NAME smallpayroll-http-706414765.us-east-1.elb.amazonaws.com $ elb-configure-healthcheck smallpayroll-http --target "HTTP:80/" \ --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2 HEALTH_CHECK TARGET INTERVAL TIMEOUT HEALTHY_THRESHOLD UNHEALTHY_THRESHOLD HEALTH_CHECK HTTP:80/ 30 3 2 2 |
Listing 7 shows two commands. The first command, elb-create-lb
, creates the load balancer. The first parameter is the name of the load balancer, which is unique only to you. The --listener parameter dictates that the public facing port is 80, and that it is also to be connected to port 80 on the instance, and that the protocol in use is HTTP. The output of this command is a DNS name, in this case, smallpayroll-http-706414765.us-east-1.elb.amazonaws.com. Unlike most load balancers, you are not given a public IP address to connect to. Amazon assigns its own IP addresses and you connect through a DNS alias.
The second command, elb-configure-healthcheck
, first makes reference to the name of the load balancer, and then specifies that the health check will be performed with the HTTP protocol on port 80, using the root URL. It is also possible to write a separate controller and action to handle the checks, such as /status, but in this case, the root URL provides enough assurance the application is running properly.
The second line of parameters specify, in order, the following:
- --interval 30: Test every 30 seconds.
- --timeout 3: How long to wait for a response before failing the test.
- --unhealthy-threshold 2: Two consecutive failed tests will mark the server as out of service.
- --healthy-threshold 2: A failed service will require two consecutive successful checks before the server is brought back into the pool.
The next step is to attach instances to the load balancer. You can add and remove instances at will.
Listing 8. Adding two instances to the load balancer
$ elb-register-instances-with-lb smallpayroll-http --instances i-87f232ed,i-85f232ef INSTANCE_ID INSTANCE_ID INSTANCE_ID i-85f232ef INSTANCE_ID i-87f232ed $ elb-describe-instance-health smallpayroll-http --headers INSTANCE_ID INSTANCE_ID STATE DESCRIPTION REASON-CODE INSTANCE_ID i-85f232ef InService N/A N/A INSTANCE_ID i-87f232ed InService N/A N/A |
Listing 8 first shows two instances being added to the smallpayroll-http load balancer. Run the elb-describe-instance-health
command to see the status of each server in the pool. InService means the service is able to handle requests through the load balancer.
Finally, browse to the DNS name of your load balancer, and you should see the application working across two servers. To make the load balancer work for the real DNS name of your application, you will have to change your application's DNS record from an A record to a CNAME pointing at the DNS name of the load balancer. See the resources for some more details about the DNS requirements, including some caveats. While the DNS method is cumbersome, it allows you to handle orders of magnitude more requests than by building a load balancer on an EC2 instance.
The DNS change can happen at any time because there would be no disruption of service.
The application is now spread across two nodes and the database server can be started from scratch in less then half an hour. This is good for availability, but doesn't help if an administrator accidentally destroys critical data, or if the EBS volume fails. Fortunately there are solutions to address these problems.
EBS provides a snapshot feature which stores a copy of the volume in S3. To be more correct, an EBS snapshot stores the differences since the last snapshot. A database complicates the matter because it caches some disk writes, which may result in an inconsistent snapshot. Therefore you must make sure that everything is on disk in a consistent state. The order of the backups will be:
- Tell PostgreSQL to enter backup mode.
- Freeze the filesystem.
- Request a snapshot from Amazon.
- Unfreeze the filesystem.
- Tell PostgreSQL that the backup is over.
Even though this procedure may take a minute or two, Amazon is spooling the snapshot to S3 in the background. Changes made after step 3 will not be reflected in the snapshot, however.
Listing 9. Backing up the database
#!/bin/sh export EC2_HOME=/usr/local/ export JAVA_HOME=/usr export EC2_CERT="/root/.ec2/cert.pem" export EC2_PRIVATE_KEY="/root/.ec2/pk.pem" echo "select pg_start_backup('snapshot')" | su - postgres -c psql /usr/sbin/xfs_freeze -f /ebsvol/ /usr/local/bin/ec2-create-snapshot vol-93f77ffa --description "`date`" /usr/sbin/xfs_freeze -u /ebsvol/ echo "select pg_stop_backup('snapshot')" | su - postgres -c psql |
You can verify the status of the snapshot with the ec2-describe-snapshots
command, just like Listing 10.
Listing 10. Showing the EBS snapshots (condensed)
$ ec2-describe-snapshots --headers SnapshotId VolumeId Status StartTime SNAPSHOT snap-298cb741 vol-93f77ffa completed 2010-06-29T02:50:55 SNAPSHOT snap-a2b959c9 vol-93f77ffa completed 2010-07-13T15:14:54 |
Listing 10 shows two completed snapshots, along with their times.
You should automate the creation of snapshots by running Listing 9 from cron. You should also periodically prune your list of snapshots with the ec2-delete-snapshot
command.
If your EBS volume fails, or you need to restore old data from EBS, you will need to restore from your last snapshot. The procedure to restore an EBS volume is almost identical to creating a new one. Listing 11 shows how to restore the last snapshot from Listing 10.
Listing 11. Restoring an EBS snapshot
$ ec2-create-volume --snapshot snap-a2b959c9 -z us-east-1a -s 20 VOLUME vol-d06b06b9 20 snap-a2b959c9 us-east-1a creating |
You may then mount this volume on any instance to restore your data.
Backing up and restoring files
A simple way to backup files from your servers is to copy them into S3, or make them part of your stock AMI. The latter is more effective for binaries and software packages, while copying to S3 is more for user data. The S3Sync tool provides some command line S3 tools, along with a handy rsync-like utility.
Download the S3Sync utilities (see the resources section for the link). Listing 12 shows how to create a bucket for backups, and how to upload files to S3.
Listing 12. Backing up your data to S3
$ s3cmd.rb createbucket smallpayroll-backup $ s3cmd.rb listbuckets ertw.com smallpayroll-backup $ s3sync.rb -r /var/uploads smallpayroll-backup:`date +%Y-%m-%d` $ s3cmd.rb list smallpayroll-backup:2010-07-12 -------------------- 2010-07-12/uploads 2010-07-12/uploads/file1.txt |
Listing 12 starts by creating a bucket called smallpayroll-backup. You may safely store different backups from different times in the same bucket, so this step only need be done once. The second command verifies that the bucket was created, you can see the bucket that was just created and the ertw.com bucket where the AMIs are.
The s3sync.rb
command recursively copies the /var/uploads directory into the backup bucket, prefixing all the files with the current date. The final command shows all the files inside that bucket.
Restoring files is just as simple. You can either s3sync with the parameters reserved, or retrieve an individual file through another tool like S3 file manager.
The SmallPayroll application is running in the cloud, and is better designed for future growth. Even though the mean time between failures of the hardware hasn't changed, the backups and scripts put in place mean that the data is safe and that the environment can be quickly rebuilt if needed.
Most of the original shortcomings of a straight migration to the cloud have been addressed. There is little visibility into the health of the environment, however, and it would be helpful to be able to scale server resources in response to demand. These issues will be addressed in the next two articles.
Learn
-
The RightScale guys have given an in depth look at EBS, which gives more information about snapshots and availability zones.
-
You can attach multiple EBS volumes to one server. The MySQL performance blog has an excellent comparison of using RAID to increase EBS performance that is relevant to other databases, too.
- PostgreSQL online backups are invaluable, but they take some understanding to ensure consistency.
-
An EC2 instance has several pieces of instance metadata that can help the instance learn about the environment. Browse through this chapter of the EC2 documentation and get some ideas of what can be done.
-
If you're trying to use the ELB to balance the root of your domain, such as example.com (also called the root apex, this discussion of ELB and CNAMEs to the root apex will highlight the problems and some workarounds.
Get products and technologies
-
Now that you've got multiple AMIs inside S3, you might want to prune some old ones. S3 File Manager is a web based file manager that rivals the features of many standalone applications or browser plugins. If you delete an AMI, don't forget to
ec2-deregister
it. -
S3Sync is a helpful tool for copying files too and from S3, and manipulating your buckets.
-
The S3 File Manager is better than anything else out there for navigating your S3 buckets, and it doesn't even involve installing software. With a good browser, you can even drag and drop files from your desktop to S3.
Discuss
- Participate in the discussion forum.
- Get involved in the My developerWorks community.
Connect with other developerWorks users while exploring the
developer-driven blogs, forums, groups, and wikis.