Saturday, March 13, 2010

Connecting Clouds - Summary of Our Intercloud Network Testing

Over the past few months we've been conducting some intercloud network testing. To get started, we first setup testing nodes in 28 different infrastructure/server clouds (a.k.a IaaS). Twice daily at random times (hourly for latency), these nodes initiate throughput and latency network tests with each of the other 27 nodes in the test group. These tests consist of uploading and downloading a 5 MB test file and recording the throughput, and pinging to determine latency. The purpose is to determine which clouds are best connected to each other. In the results tables below, we've compiled averages from all of these tests ordered by downlink throughput and displayed the top 10 best connected clouds to each of the 28 we tested.

As reliable as cloud services generally are, they do occasionally fail. We believe a good cloud adoption strategy involves use of more than one cloud service to avoid risks of extended downtime (i.e. natural disaster, power failures, etc.). Such a strategy could be as simple as maintaining backups in a separate cloud for cold restores, to complex load balanced cross cloud clusters of application servers and data repositories. In both cases, good throughput and low latency between clouds will allow data to be transmitted quickly and efficiently. If latency is high or throughput low, you may run into data consistency issues and performance bottlenecks.

A few admin notes regarding our testing:
  1. With a few exceptions, most cloud services limit Internet uplinks to 100 Mb/s or less
  2. A 5MB test file is not sufficiently large to determine accurate throughput capacity between clouds. The downlink and uplink values provided here are for comparison purposes only and not meant to represent actual throughput capacity (which could be even higher for larger data transfers)
  3. We've found that routing is not always symmetrical between uplink and downlink tests. This is usually the reason for large discrepancies between uplink and downlink throughput
  4. The results are broken down by cloud. The table shows the 10 best connected clouds for each. The downlink, uplink and latency values are measured by the cloud shown in the left column of the table
These tests are ongoing. Over the next few months, we plan to make this data available in real-time via web services and on our website to include filtering criteria. If you have any suggestions on how we might improve the accuracy of these tests, please feel free to comment.

The Results:

EC2 - US East
EC2 - US West
EC2 - EU West
ElasticHosts
London, UK
Flexiscale
London, UK
GoGrid
San Francisco, CA
New York, NY
Linode
Linode - Atlanta, GA
Linode - Dallas, TX
Linode - Fremont, CA
Linode - London, UKLinode - Newark, NJ
NewServers
Dallas, TX
ReliaCloud
Minnesota, US
Rimu Hosting
Rimu - Auckland, NZ
Rimu - Dallas, TX
SoftLayer
SoftLayer - Dallas, TX
SoftLayer - Seattle, WA
SoftLayer - Washington DC
Quebec, Canada
Storm Cloud Servers
Voxel - New York, NY
Voxel - Amsterdam, NL
Voxel - Singapore
Zerigo
Denver, CO



Tuesday, March 2, 2010

Cloud Server Performance Benchmarking

We are in the process of developing a benchmark suite to run in the cloud and use as a basis for comparing different IaaS (cloud server) vendors. There are lots of uses for cloud servers be it web, application or database servers; scientific computing; video encoding; etc. In establishing our benchmark suite we'd like to be as comprehensive as possible in order to provide decent coverage of most computational needs. Our current list of benchmarks includes the following:

All benchmarks will be run on similar CentOS 64-bit server instances. Are there any benchmarks you'd like to see that are not in the list? We'd appreciate any comments, suggestions or feedback.