Saturday, May 29, 2010

What is an ECU? CPU Benchmarking in the Cloud

NOTE: This post has been updated after it's original writing. The original CPU performance metrics did not accurately depict performance on multi-core servers. The updated post utilizes an improved method of calculating CPU performance that applies more weight to multi-core aware benchmarks (see benchmarks description below for more info).

Over the past couple of months we've spent some time benchmarking about 150 different cloud server configurations with 20 different vendors. This included all 8 AWS EC2 instances types (m1.small - m2.4xlarge) in all 4 regions (32 servers total for EC2). The benchmark suite we ran includes about 100 different benchmarks from synthetic benchmarks measuring raw CPU performance such as Unixbench and Geekbench to higher level application benchmarks such as mysql-bench, pgbench, tpcc-mysql and blog bench. This post will be the first in a series highlighting the results of these benchmarks. In it, we'll focus purely on raw CPU performance. Future posts will focus on other aspects of performance such as disk IO and application specific performance metrics.

We believe choosing a cloud provider should be based on a variety of factors including performance, price, support, reliability/uptime, scalability, network performance and features. We've previously written a few posts regarding network performance are continue to compile network performance and uptime statistics for most of the major cloud providers. With all of hype surrounding the cloud, our goal, is to provide objective information and analysis to enable educated decisions pertaining the adoption of, and migration to cloud services.

Benchmark Setup
All benchmarked cloud servers were configured almost identically in terms of OS and software, CentOS 5.4 64-bit (or 32-bit in the case of EC2 m1.small and c1.medium and IBM's Development Cloud where 64-bit is not supported).

Benchmark Methodology
Most IaaS/server clouds are based on hypervisor/virtualization technology and running in multi-tenant environments (multiple virtual servers running on a single physical host). Different hypervisors support different methods of CPU allocation/sharing including fixed/weighted, burstable, and others. Because of this, it is difficult to compare CPU performance in different clouds. Vendors often use different terminology to define cloud server CPUs including ECU (EC2), VPU (vCloud), GHz (KVM), CPUs, Cores, and more. Many provide an approximation of how that terminology relates to physical resources (e.g. 1 ECU = 1.0-1.2 GHz 2007 Xeon), but this is generally not sufficient for an objective comparison of providers.

Amazon's EC2 in addition to being one of the oldest and most mature cloud server platforms, also provides clearly defined CPU tiers across its 8 different instance sizes. These are defined in terms of ECUs (EC2 Compute Unit) where 1 ECU is the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. Their instance sizes includes the following:

Small/m1.small (32-bit) = 1 ECU
Large/m1.large = 4 ECUs
High-CPU Medium/c1.medium (32-bit) = 5 ECUs
High-Memory Extra Large/m2.xlarge = 6.5 ECUs
Extra Large/m1.xlarge = 8 ECUs
High-Memory Double Extra Large/m2.2xlarge = 13 ECUs
High-CPU Extra Large/c1.xlarge = 20 ECUs
High-Memory Quadruple Extra Large/m2.4xlarge = 26 ECUs

With a few exceptions, most of our CPU benchmarks showed clear upward scaling (although not always proportional) from m1.small (1 ECU) up to m2.4xlarge (26 ECUs). Because of these factors, we feel that the ECU metric provides a good, standard, understandable metric for comparing cloud servers not only within EC2, but also within other IaaS clouds as well. However, although it is based on the ECU, there will be some subtle differences (as described below), so we will refer this new metric as a CCU (CloudHarmony Compute Unit)

To calculate the CCU metric we selected 19 CPU benchmarks that showed clear upward scaling on smaller to larger sized EC2 instances. We use the average of the highest scores in all 4 EC2 regions (generally from the m2.4xlarge 26 ECU instance) to produce a 100% baseline for each of these 19 benchmarks. Each instance is then assigned a relative score as a ratio of that instance's score to the highest average score (<= 100). The relative scores for all 19 benchmarks are then aggregated to produce a CPU comparison score (CCS) for each instance. In calculating the CCS, some benchmarks are weighted higher than others. For example, the Geekbench and Unixbench results are weighted 200 points for the baseline, while opstone and john-the-ripper are weighted 33 points each (the remaining benchmarks are all weighted 100). We then use these results to create a CCU evaluation table where the left column in the table is the # of CCUs, and the right column is the average CCS corresponding with that CCU value. This table was then populated with 5 rows, one for EC2 instance sizes m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge, using an average of the CCS for those instances in all 4 regions. Once the comparison table was populated, we use the same algorithm to compute CCS values for every cloud server benchmarked. To translate from CCS to CCU, we determine the closest matching row(s) to the CCS for a given cloud server using the left column, and then compute an CCU value using the right column (if the CCS falls between 2 rows, a proportional average CCU value is calculated).

Example CCU Evaluation Table
ECUs CCS EC2 Instance
26 1574 m2.4xlarge
13 1405 m2.2xlarge
6.5 1225 m2.xlarge
4 873 m1.large
1 312 m1.small

Example CCU Calculation
Eval_CCS = 1524
Enter table from top and find first row where Eval_CCS >= Column_2 (Row 2)
Since Eval_CCS resides between rows 1 and 2, find the proportional midpoint between ECUs:
CCU = 13 + ((1524 - 1405)/(1574 - 1405))*(26 - 13)
CCU = 13 + 9.15
CCU = 22.15

EC2 Discrepancies
During EC2 benchmarking we observed the CPU architecture reported for each instance type:

Small (m1.small) - US East Region Only: AMD Opteron 2218 2.6 GHz
Small, Large, Extra Large (m1.small, m1.large, m1.xlarge): Xeon E5430 "Harpertown" 2.66 GHz
High-CPU Medium and Extra Large (c1.medium, c1.xlarge): Xeon 5410 "Harpertown" 2.33 GHz
High-Memory Extra Large, 2 Extra Large, 4 Extra Large (m2.xlarge, m2.2xlarge, m2.4xlarge): Xeon X5550 "Nehalem" 2.66 GHz

We also noted the following EC2 ECU discrepancies with most of the 19 CPU performance benchmarks performed:
1. The m1.large (4 ECU) always outperformed the High-CPU c1.medium (5 ECU) instance. This might be attributed to the m1.large being 64-bit vs 32-bit for the c1.medium
2. Even the lowest High-Memory instance (m2.xlarge - 6.5 ECU) out performs the larger m1.large (8 ECU) and c1.xlarge (20 ECU) instances in many cases. This is most likely due to the newer and faster "Nehalem" CPUs used by the High-Memory instances
3. The performance increase between m2.4xlarge (13 ECU) and m2.2xlarge (26 ECU) was minimal (15-20% higher based on CCU)

Because of these discrepancies, we used only the m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge instance sizes to create the CCU comparison table used to calculate CCUs for other cloud servers.

Benchmarks
The following are the 19 benchmarks we use to compute the CCU comparison metrics (benchmarks prefixed with ** are multi-core aware):


**c-ray [weight=100]: This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image.

**crafty [weight=100]: Crafty is a popular open-source chess engine that can be used to benchmark your CPU speed and is part of SPEC2000 benchmark. The benchmark itself is very basic. It analyzes pre-determined chess games positions and calculates the number of "nodes" (moves) per second till certain "depth" is reached and displays the total NPS as well as the average NPS.

dcraw [weight=100]: This test times how long it takes to convert several high-resolution RAW NEF image files to PPM image format using dcraw.

espeak [weight=100]: This test times how long it takes the eSpeak speech synthesizer to read Project Gutenbergs The Outline of Science and output to a WAV file.

**geekbench [weight=200]: Geekbench provides a comprehensive set of benchmarks engineered to quickly and accurately measure processor and memory performance. Designed to make benchmarks easy to run and easy to understand, Geekbench takes the guesswork out of producing robust and reliable benchmark results.

**graphics-magick [weight=100]: This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the systems CPU.

**hmmer [weight=100]: This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein.

john-the-ripper-blowfish [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.

john-the-ripper-des [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.

john-the-ripper-md5 [weight=33]: This is a benchmark of John The Ripper, which is a password cracker.

mafft [weight=100]: This test performs an alignment of 100 pyruvate decarboxylase sequences.

nero2d [weight=100]: This is a test of Nero2D, which is a two-dimensional TM/TE solver for Open FMM. Open FMM is a free collection of electromagnetic software for scattering at very large objects. This test profile times how long it takes to solve one of the included 2D examples.

**openssl [weight=100]: This test measures the RSA 4096-bit performance of OpenSSL.

opstone-svd [weight=33]: CPU Singular Value Decomposition test.

opstone-svsp [weight=33]: CPU Sparse-Vector Scalar Product test.

opstone-vsp [weight=33]: CPU Vector Scalar test

sudokut [weight=100]: This is a test of Sudokut, which is a Sudoku puzzle solver written in Tcl. This test measures how long it takes to solve 100 Sudoku puzzles.

tscp [weight=100]: CPU performance benchmark based on TSCP (Tom Kerrigan's Simple Chess Program).

**unixbench [weight=200]: UnixBench provides a basic indicator of the performance of a Unix-like system. Multiple tests are used to test various aspects of the system's performance. These test results are then compared to the scores from a baseline system to produce an index value, which is generally easier to handle than the raw scores. The entire set of index values is then combined to make an overall index for the system. The parallel results are used when multiple CPUs exist for a cloud server.

We credit the Phoronix Test Suite for making it easier to run many of these benchmarks.

Results
The following results are broken down by cloud server vendor. If the vendor utilizes multiple data centers, multiple tables are displayed one for each data center. A total of 140 different cloud server configurations are included in this post. Each table shows our server identifier, the CPU architecture our benchmark server was placed on, the amount of memory for the server, raw Geekbench results linked to the full results page, raw Unixbench results linked to the full results page, and finally, the CCU score for that server instance.

EC2 is one of the oldest, most widely used, and mature cloud server platforms. EC2 currently supports 4 regions (US East, US West, EU West and APAC) and 10 availability zones. Each region consists of 2 or more availability zones each of which is basically a different physical data center in close proximity to the other availability zone in that region. EC2 uses the EC2 Compute Unit (ECU) term to describe CPU resources for each instance size where one ECU provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Although the CCU metric is based on EC2's ECU, the comparison table used to compute CCUs is based on only 5 instances sizes (m1.small, m1.large, m2.xlarge, m2.2xlarge and m2.4xlarge) and an average scores from all 4 regions. Because of this, the EC2 instance CCU will not be precisely equal to it's ECU allocation.

EC2 offers multiple pricing options including straight hourly, reserve (upfront reserve fee in exchange for lower hourly), and spot (bid pricing). The pricing shown in these table is for straight hourly pricing.


Amazon Web Services (AWS) [US East]
ID CPU Memory Price Geekbench Unixbench CCUs
m2.4xlarge [26 ECUs] Xeon X5550 68.4 2.4/hr 5877 1511 27.25
m2.2xlarge [13 ECUs] Xeon X5550 34.2 2.4/hr 5163 1332 14.89
linux.c1.xlarge [20 ECUs] Xeon E5410 7 0.68/hr 5118 780 8.78
m2.xlarge [6.5 ECUs] Xeon X5550 17.1 0.5/hr 4049 932.1 7.05
m1.xlarge [8 ECUs] Xeon E5430 15 0.68/hr 4256 938.6 5.15
m1.large [4 ECUs] Xeon E5430 7.5 0.34/hr 3092 663.4 4.08
c1.medium [5 ECUs] Xeon E5410 1.7 0.17/hr 2635 776.2 3.43
m1.small [1 ECU] Opteron 2218 1.7 0.085/hr 1726 179.7 0.92




Amazon Web Services (AWS) [US West]
ID CPU Memory Price Geekbench Unixbench CCUs
m2.4xlarge [26 ECUs] Xeon X5550 68.4 2.68/hr 6109 1520 27.45
m2.2xlarge [13 ECUs] Xeon X5550 34.2 1.34/hr 5475 1329.2 15.9
c1.xlarge [20 ECUs] Xeon E5410 7 0.76/hr 4693 785 8.21
m2.xlarge [6.5 ECUs] Xeon X5550 17.1 0.57/hr 3883 945.4 6.85
m1.xlarge [8 ECUs] Xeon E5430 15 0.76/hr 4185 916.8 5.14
m1.large [4 ECUs] Xeon E5430 7.5 0.38/hr 3026 643.4 3.95
c1.medium [5 ECUs] Xeon E5410 1.7 0.19/hr 2962 715.5 3.45
m1.small [1 ECU] Xeon E5430 1.7 0.095/hr 1312 277.2 1.04


Amazon Web Services (AWS) [EU West]
ID CPU Memory Price Geekbench Unixbench CCUs
m2.4xlarge [26 ECUs] Xeon X5550 68.4 2.68/hr 6188 1489.7 27.45
m2.2xlarge [13 ECUs] Xeon X5550 34.2 1.34/hr 5368 1337.6 15.19
c1.xlarge [20 ECUs] Xeon E5410 7 0.76/hr 4926 787 8.55
m2.xlarge [6.5 ECUs] Xeon X5550 17.1 0.57/hr 3836 945.9 6.79
m1.xlarge [8 ECUs] Xeon E5430 15 0.76/hr 4147 934.7 5.14
m1.large [4 ECUs] Xeon E5430 7.5 0.38/hr 3160 644 4.12
c1.medium [5 ECUs] Xeon E5410 1.7 0.19/hr 2710 730.5 3.43
m1.small [1 ECU] Xeon E5430 1.7 0.095/hr 1288 277.3 1.02



Amazon Web Services (AWS) [APAC]
ID CPU Memory Price Geekbench Unixbench CCUs
m2.4xlarge [26 ECUs] Xeon X5550 68.4 2.68/hr 3472 1465.8 12.46
m2.2xlarge [13 ECUs] Xeon X5550 34.2 1.34/hr 3357 1228.4 9.62
m2.xlarge [6.5 ECUs] Xeon X5550 17.1 0.57/hr 3469 821.2 6.27
c1.xlarge [20 ECUs] Xeon E5410 7 0.76/hr 3042 906.1 5.66
m1.xlarge [8 ECUs] Xeon E5430 15 0.76/hr 3271 867.4 4.65
m1.large [4 ECUs] Xeon E5430 7.5 0.38/hr 2594 598.9 3.92
c1.medium [5 ECUs] Xeon E5410 1.7 0.19/hr 2796 415.8 3.23
m1.small [1 ECU] Xeon E5430 1.7 0.095/hr 1522 183.5 1.02


Rackspace Cloud
Rackspace Cloud servers showed a very flat CPU performance variation between server instance sizes. All instances we benchmarked in both data centers utilized homogenous Opteron 2374 "Shanghai" 2.2 GHz hardware. Rackspace states that their CPU provides a minimum allocation based on instance size with bursting allowed for all instance sizes.

We tested servers in both their Dallas as well as the newer Chicago data centers. However, Rackspace does not allow users to chose which data center to deploy servers to. Their use of multiple data centers appears to deal more with capacity issues rather than to offer user choice. When you create an account, that account is assigned to a specific data center, and from that point forward you will only have the option to deploy to that assigned data center.

Rackspace Cloud [Dallas]
ID CPU Memory Price Geekbench Unixbench CCUs
rs-16gb Opteron 2374 16 0.96/hr 3222 933.2 4.95
rs-2gb Opteron 2374 2 0.12/hr 3330 935.4 4.94
rs-1gb Opteron 2374 1 0.06/hr 3462 934.2 4.93
rs-4gb Opteron 2374 4 0.24/hr 3211 935.1 4.9


Rackspace Cloud [Chicago]
ID CPU Memory Price Geekbench Unixbench CCUs
rs-16gb-il Opteron 2374 16 0.96/hr 4473 1019.2 5.41
rs-8gb-il Opteron 2374 8 0.48/hr 4291 1035.5 5.39
rs-1gb-il Opteron 2374 1 0.06/hr 4339 1042.9 5.38
rs-4gb-il Opteron 2374 4 0.24/hr 4368 1013.8 5.29
rs-2gb-il Opteron 2374 2 0.12/hr 4360 963.4 5.17


Storm on Demand was launched a few months ago by Liquid Web. They provide 2 types of cloud servers. The first is traditional 2-48GB cloud server running on multi-tenant hosts. The second, called "Bare Metal", allows you to select specific dedicated hardware (CPU, SATA or SAS disks, memory) to deploy your server on. Bare Metal servers are still virtualized, but do not share the underlying hardware with any other server instances.

Of all the IaaS vendors we reviewed, Storm offers by far the most diverse heterogenous infrastructure. This approach pays off big in terms of performance with 10 servers that scored 20+ CCUs. The only mismatched hardware we discovered was the cloud 16GB server running on Opteron 2350 hardware which performed poorly compared with the small 2, 4 and 8 GB servers. However, Storm has informed us that their 16GB Opteron 2350 hardware is being upgraded to Opteron 2378 which should improve performance on future benchmarks.

Storm's 48GB cloud server was the top performer out of all of our benchmarked servers with 42.5 CCUs and a Geekbench score of 13020! This is most likely due to the very new and extremely fast Xeon X5650 "Westmere" hardware it runs on. The Intel i5 CPUs also performed very well with our CPU benchmarks and provide an excellent performance to price ratio (26.5 CCUs for $0.171/hr)!

Storm Cloud [MI, US]
ID CPU Memory Price Geekbench Unixbench CCUs
Cloud: 48gb Xeon X5650 45.9 1.37/hr 13020 4448.5 42.87
Bare Metal: x3440-8gb Xeon X3440 8 0.274/hr 7597 2760.1 27.39
Bare Metal: e5506x2-4gb Xeon E5506 8 0.274/hr 7742 2906.7 27.24
Bare Metal: e5506x2-4gb Xeon E5506 8 0.322/hr 7704 2880.1 27.13
Bare Metal: e5506x2-8gb Xeon E5506 8 0.391/hr 7753 2837.5 26.72
Bare Metal: i5-750-2gb Core i5 750 2 0.171/hr 6417 2464.8 26.6
Bare Metal: i5-750-4gb Core i5 750 4 0.206/hr 6413 2450.1 26.47
Bare Metal: e5506x2-8gb Xeon E5506 8 0.48/hr 7686 2811.5 26.51
Cloud: 32gb Opteron 2378 30.4 0.69/hr 8326 2373.3 26.4
Cloud: 8gb Xeon X3440 7 0.27/hr 6162 2284.7 21.41
Cloud: 4gb Core i5 750 3.5 0.14/hr 4555 1465.1 9.33
Bare Metal: amd2350x2-32gb Opteron 2350 32 0.713/hr 6531 1803.6 7.31
Cloud: 2gb Core 2 Q9400 1.7 0.07/hr 3062 629.7 5.01
Cloud: 16gb Opteron 2350 15.2 0.34/hr 4034 1214 4.73


GoGrid's servers showed a decent linear performance increase with larger sized instances. The largest 8GB instance was deployed on an E5450 2.99 GHz host (compared with the E5520 2.27 GHz hosts for the other instances) which performed significantly better than the smaller sized instances.


GoGrid [CA, US]
ID CPU Memory Price Geekbench Unixbench CCUs
gg-8gb Xeon E5450 8 1.52/hr 8105 507.5 23.2
gg-4gb Xeon E5520 4 0.76/hr 5373 1866.9 9.28
gg-2gb Xeon E5520 2 0.38/hr 3549 993.3 4.87
gg-1gb Xeon E5520 1 0.19/hr 2860 941.2 4.42



Voxel maintains 3 cloud data centers in the US (New York), EU (Amsterdam), and Asia (Singapore). All of our cloud servers were deployed on homogenous harware: Xeon L5520 "Nehalem" 2.26 GHz. They appear to use more of a fixed CPU allocation because there was a notable increase in performance on larger instance sizes.


Voxel [NY, US]
ID CPU Memory Price Geekbench Unixbench CCUs
vx-14gb-ny Xeon L5520 14 0.727/hr 5176 1023.5 10.15
vx-8gb-ny Xeon L5520 8 0.421/hr 4022 846.4 6.12
vx-4gb-ny Xeon L5520 4 0.211/hr 3487 680.7 5.33
vx-2gb-ny Xeon L5520 2 0.106/hr 2876 483.4 4.67



Voxel [NL]
ID CPU Memory Price Geekbench Unixbench CCUs
vx-14gb-nl Xeon L5520 14 0.727/hr 6169 1178 16.2
vx-8gb-nl Xeon L5520 8 0.421/hr 4699 954.7 7.53
vx-4gb-nl Xeon L5520 4 0.211/hr 3416 645.8 5.48
vx-2gb-nl Xeon L5520 2 0.106/hr 3093 494 4.85



Voxel [SG]
ID CPU Memory Price Geekbench Unixbench CCUs
vx-14gb-sg Xeon L5520 14 0.727/hr 5419 1107.8 11.17
vx-8gb-sg Xeon L5520 8 0.421/hr 4428 909.8 6.24
vx-4gb-sg Xeon L5520 4 0.211/hr 3168 611.3 5.04
vx-2gb-sg Xeon L5520 2 0.106/hr 2779 463.7 4.4

NewServers is fairly unique in that their "bare metal" cloud servers actually run on physical hosts. There is no hypervisor layer between the server and the underlying hardware. When you deploy a server, the OS image is written directly to the physical disk(s). Their "Fast" server performed very well and is one of the better values at $0.53/hr for 26.41 CCUs.


NewServers [FL, US]
ID CPU Memory Price Geekbench Unixbench CCUs
ns-fast Xeon E5450 4 0.53/hr 6271 2193.5 27.21
ns-jumbo Xeon E5504 48 0.6/hr 5713 2809.7 19.15
ns-large Xeon E5405 4 0.25/hr 4505 1849.7 6.12
ns-med Xeon 3.20 2 0.17/hr 2672 708.5 3.43
ns-small Xeon 2.80 1 0.11/hr 1669 491.3 2.47



While Linode doesn't market itself as a "cloud", we included it in the benchmarks because they are a good and very popular service and provide many of the features common to the cloud including auto-provisioning, disk imaging. All instance sizes we benchmarked deployed on homogenous Xeon L5520 2.26 GHz hardware. The servers show a very flat CPU performance variation between instance sizes. We only benchmarked servers in their Atlanta data center. Linode also maintains data centers in Fremont CA, Dallas TX, Newark NJ, and London UK.


Linode VPS Hosting [Atlanta]
ID CPU Memory Price Geekbench Unixbench CCUs
ln-5760-atlanta Xeon L5520 5.54 10.67/day 4217 1160.5 6.4
ln-14400-atlanta Xeon L5520 14.06 26.67/day 3865 1144.1 6.3
ln-1080-atlanta Xeon L5520 1.05 2/day 3623 957.2 5.48
ln-2880-atlanta Xeon L5520 2.81 5.33/day 3625 979.4 5.38
ln-8640-atlanta Xeon L5520 8.45 16/day 3780 971 5.33



All our benchmark servers deployed on to either Xeon X3460 2.8 GHz or E5520 2.27 GHz hardware. SoftLayer is another provider where CPU performance was very flat. Disk I/O was also painfully slow, but that is a topic for another post.


SoftLayer [Dallas]
ID CPU Memory Price Geekbench Unixbench CCUs
sl-8gb-dallas Xeon E5520 8 0.5/hr 5862 1514.1 14.35
sl-1gb-dallas Xeon X3460 1 0.15/hr 3745 502.7 6.1
sl-4gb-dallas Xeon E5520 4 0.35/hr 4482 941.8 5.52
sl-2gb-dallas Xeon X3460 2 0.25/hr 4399 573.9 5.11



SoftLayer [WDC]
ID CPU Memory Price Geekbench Unixbench CCUs
sl-4gb-wdc Xeon X3460 4 0.35/hr 5599 1250.9 16.88
sl-8gb-wdc Xeon E5520 8 0.5/hr 5885 1527.7 16.84
sl-2gb-wdc Xeon X3460 2 0.25/hr 4265 871.5 7.62
sl-1gb-wdc Xeon X3460 1 0.15/hr 3710 507.7 5.81


SoftLayer [Seattle]
ID CPU Memory Price Geekbench Unixbench CCUs
sl-8gb-seattle Xeon E5520 8 0.5/hr 6040 1520.8 14.07
sl-4gb-seattle Xeon X3460 4 0.35/hr 5455 1324.4 12.3
sl-2gb-seattle Xeon X3460 2 0.25/hr 4429 822.8 6.49
sl-1gb-seattle Xeon X3460 1 0.15/hr 3716 505.8 5.52



Terremark vCloud Express
Terremark is one of the first VMWare vCloud providers. Deployment of servers in vCloud allows the user to select both desired memory and VPUs (the vCloud term for CPUs). All of our benchmark servers deployed on homogenous Opteron 8389 2.91 GHz hardware. CPU performance varied from benchmark to benchmark. Unixbench's parallel benchmark did show a notable increase in performance from 1 to 8 VPUs. However, other benchmarks did not show much increase leading to a flat CCU metric across all instance sizes and VPU combinations we benchmarked.


Terremark vCloud Express [FL, US]
ID CPU Memory Price Geekbench Unixbench CCUs
tm-16gb-8vpu Opteron 8389 16 1.672/hr 4304 2371.5 7.69
tm-8gb-4vpu Opteron 8389 8 0.64/hr 5999 1667 6.41
tm-1gb Opteron 8389 1 0.074/hr 2863 1279.3 5.44
tm-2gb Opteron 8389 2 0.137/hr 2779 1272.7 5.38
tm-4gb-2vpu Opteron 8389 4 0.305/hr 3793 1336.6 5.34



OpSource is another VMWare based cloud. OpSource allows you to configure cloud severs with 1-4 "CPUs". CPU performance was very flat between instances of varying sizes even showing a decrease in performance from smaller to larger sized instances. All servers deployed to identical Xeon X7460 2.66 GHz hardware.


OpSource Cloud [VA, US]
ID CPU Memory Price Geekbench Unixbench CCUs
os-1gb Xeon X7460 1 0.268/hr 2100 361.5 2.56
os-2gb Xeon X7460 2 0.296/hr 1894 338.1 2.43
os-4gb-2cpu Xeon X7460 4 0.392/hr 1923 241 2.24
os-8gb-4cpu Xeon X7460 8 0.584/hr 1830 149.3 1.87
os-16gb-4cpu Xeon X7460 16 0.808/hr 2004 129 1.87
os-32gb-4cpu Xeon X7460 32 1.256/hr 1908 130.8 1.82



Speedyrails is a VPS provider based out of Quebec Canada. All benchmark servers deployed on homogenous hardware: Xeon E5520 2.27 GHz.


Speedyrails [QC, CA]
ID CPU Memory Price Geekbench Unixbench CCUs
sr-1gb Xeon E5520 1 2.27/day 4413 1260.8 9.06
sr-4gb Xeon E5520 4 8/day 4359 1261.1 8.49
sr-2gb Xeon E5520 2 4.27/day 4347 1066.7 6.74



Zerigo is a VPS and Cloud Server vendor based out of Denver, CO. All benchmark servers deployed on identical hardware, Opteron 2374 2.20 GHz.


Zerigo [CO, US]
ID CPU Memory Price Geekbench Unixbench CCUs
zr-4gb Opteron 2374 4 0.24/hr 3829 805.5 4.83
zr-2gb Opteron 2374 2 0.12/hr 3274 813.9 4.73



While there was a notable increase in performance with larger sized instances, the overall CPU performance was not great. Hardware was also homogenous between different instance sizes.


ReliaCloud Cloud Services [MN, US]
ID CPU Memory Price Geekbench Unixbench CCUs
rc-4gb Xeon E5504 4 0.32/hr 1637 226.8 1.11
rc-2gb Xeon E5504 2 0.16/hr 1653 267.9 1.07
rc-1gb Xeon E5504 1 0.08/hr 1392 243.2 0.65



IBM's Development & Test Cloud is a free cloud service intended for only development and testing. Only 3 32-bit instance sizes are supported: small, medium and large. Instances are time limited to about 1 week with the option to extend. The large instance performed very well overall.


IBM Development & Test Cloud [NY, US]
ID CPU Memory Price (USD) Geekbench Unixbench CCUs
ibm-dev-large Xeon X5570 3.5 10256 3059.5 27.9
ibm-dev-med Xeon X5570 1.75 5536 1526.6 5.44



BlueLock is another VMWare vCloud provider. As with the other VMWare providers, they appear to use a homogenous hardware environment for all instance sizes. However, unlike most other homogenous platforms, BlueLock's instances showed a notable increase in performance on larger sized (more CPUs) instances. Strangely, the 4CPU/8GB instance outperformed the 8CPU/16GB instance.


BlueLock [IN, US]
ID CPU Memory Price Geekbench Unixbench CCUs
bl-8gb-4cpu Xeon X5550 8 0.661/hr 6311 2159.1 26.26
bl-16gb-8cpu Xeon X5550 16 1.729/hr 6665 2525.2 25.55
bl-4gb-2cpu Xeon X5550 4 0.308/hr 4575 1351 9.06
bl-1gb Xeon X5550 1 0.068/hr 3690 1153.3 6.3
bl-2gb Xeon X5550 2 0.134/hr 3727 1223.4 6.14



Cloud Central is a new cloud server provider based out of Australia. All benchmark instances deployed to homogenous AMD Opteron 2.20 GHz hosts. Prices shown are Australian Dollar.

Cloud Central [AU]
ID CPU Memory Price Geekbench Unixbench CCUs
cc-large Opteron 2374 8 0.96/hr 4224 767.3 4.97
cc-reg Opteron 2374 4 0.48/hr 3894 787.2 4.96
cc-huge Opteron 2374 16 1.92/hr 3674 785.2 4.87
cc-med Opteron 2374 2 0.24/hr 3913 755.1 4.69
cc-small Opteron 2374 1 0.12/hr 3727 756.6 4.63

RimuHosting is a VPS provider based out of New Zealand. They maintain data centers in Australia, New Zealand, London, and Texas. We benchmarked 2GB instances in their Auckland NZ and Dallas TX data centers.


RimuHosting [TX, US]
ID CPU Memory Price Geekbench Unixbench CCUs
rh-tx-2gb Xeon E5506 2 104.57/mo 3080 735.6 4.56



RimuHosting [NZ]
ID CPU Memory Price (NZD) Geekbench Unixbench CCUs
rh-nz-2gb Xeon E5506 2 210.88/mo 3122 662.3 4.49

ElasticHosts is a UK based cloud provider. They currently maintain 2 data centers in the UK and a new data center in Dallas, TX. We only benchmarked their London Peer1 data center. ElasticHosts runs on the Linux KVM hypervisor. This hypervisor is unique in that it allows you to selected a specific MHz metric when deploying cloud servers. We benchmarked 2 GHz - 20 GHz cloud servers. Although the hardware environment appears to be homogenous, the benchmarks showed a clear increase in performance on larger sized instances.


ElasticHosts [UK]
ID CPU Memory Price (USD) Geekbench Unixbench CCUs
eh-8gb-20gh Xeon E5420 8 0.654/hr 6544 881.7 9.98
eh-4gb-8gh Xeon E5420 4 0.326/hr 3919 814.3 5.54
eh-2gb-4gh Xeon E5420 2 0.164/hr 3409 631 4.75
eh-1gb-2gh Xeon E5420 1 0.082/hr 2545 539.8 4.3



Flexiscale is a UK based cloud server provider that has been around for a few years. They were recently acquired and renamed to Flexiant. They are currently in beta release of their Flexiscale 2.0 cloud server platform. These results were from testing 2.0 platform servers. Flexiant adopted a point-based subscription model for purchasing cloud servers. See their website for more details.


Flexiscale [UK]
ID CPU Memory Price (GBP) Geekbench Unixbench CCUs
fx-8gb Opteron 8218 8 4629 636.5 3.81
fx-4gb Opteron 8220 4 3407 541.6 3.66
fx-2gb Opteron 8218 2 2394 510 3.36
fx-1gb Opteron 8218 1 2354 481.9 2.95



Summary
This is our first attempt at defining a standard CPU performance metric for comparing servers in multiple clouds. We acknowledge that it is not perfect and hope to make improvements over time. Please comment on this post if you have any suggestions on how we might improve our methods. We intend to continually run these benchmarks (every couple of months) to improve the quality and quantity of our data available as well as to check for upgrades and improvements made by the providers.

One take-away point we observed is that heterogenous hardware environments (where host hardware is configured with faster CPUs for larger sized instances) appears to be more conducive to true cloud CPU scaling. Of the 20 server clouds we benchmarked, only 4 appear to be providing such an environment: EC2, Storm on Demand, GoGrid and NewServers.

17 comments:

  1. Thank you for your thoughtful efforts to benchmark cloud server CPU performance. Your efforts make us here at GoGrid feel like the industry is finally maturing.

    The "CCU" CloudHarmony Compute Unit seems like a reasonable approach to a benchmark. We hope that your benchmarking will progress to establish relative costs per CCU per hour.

    As an example, our standard GoGrid 8 GB VM produces 23.2 CCUs. The cost of this server is actually only $0.40 / hour with GoGrid's volume pricing plan, which makes it only 1.7 cents per CCU / Hour. ($0.40 / 23.2 = $0.017) And it runs both Windows and Linux at that price.

    By comparison, it appears the comparable AWS server from your testing in terms of CCUs is the "High-Memory Quadruple Extra Large/m2.4xlarge = 26 ECUs" which produced 27.45 CCUs. This server costs $2.40 / hour so $2.40 / 27.45 CCUs = 8.7 cents per CCU hour.

    Naturally we're pleased that the GoGrid server delivers approximately 5 times the CCUs per dollar in this test. However, a more interesting point to your readership is that there are huge variations in pricing, pricing models and feature-sets. It is quadruple-extra-large important that folks like CloudHarmony start to make sense of it all.

    Thank you for helping our industry,

    John Keagy
    CEO
    www.GoGrid.com

    ReplyDelete
  2. Thanks for this benchmarking effort, this is really good.

    One question though, what specific benchmark test would really measure the power of a heavily multi-threaded server application (such as a database or an application server)? I get the feeling that quite a few of the benchmarks you've used are not heavily multi-threaded and might not be able to use the full power of those high-end machines.

    Consequently, wouldn't it make sense to have a mtCCU index as well in order to provide a bench for those heavily multi-threaded environments - which will be the most probable to run on those high-end machines and be able to fully leverage their multi-CPU/Core architecture?

    That would also probably give you a much more linear scale for your mapping table. Currently, the pretty wide variations in your mapping table make me feel pretty un-easy and, IMO, cannot be simply explained by their CPU-level architecture.

    Cheeers,


    Sacha Labourey
    CloudBees, Inc.

    ReplyDelete
  3. This is a great initiative!

    I've always thought that AWS availability zones didn't provide the same horse power for he same instances and now I have proof ...

    Establishing $/CCU/h as the normalized pricing comparison metric (abstracting SLAs, platform capabilities and 3rd party support) would be a great achievement. I would love to that added to the tables above.

    I would think that displaying that metric ($/CCU/h) as a function of the CCU offered would a downward trend up until a point where one starts paying a premium for every extra CCU that an instance offers. This matches the costs of CPUs and infrastructure in general. Would love to see that chart to confirm this hunch ...

    ReplyDelete
  4. This is impressive work, if for nothing else than noting the breadth of the cloud providers surveyed. Would you be willing to share the data used to build the charts above to facilitate further analysis?

    ReplyDelete
  5. Point of national pride: RimuHosting are a New Zealand company, not an Australian company. (The rimu is a tree native to New Zealand.)

    ReplyDelete
  6. Thanks for the comments.

    > Cost to performance ratios: this is a great suggestion. We have recorded all of the different pricing options (i.e. hourly, reserve, spot, pre-pay) for every server we've benchmarked. Once we finish the series of posts on performance, we'll write a post dedicated to pricing and performance to cost ratios based on each payment method

    > Multi-threaded focused metric: I updated the post to highlight the benchmarks that are multi-thread aware versus those that are not. 900 out of 1700 points (52%) for the CCU calculation are multi-thread aware (c-ray, crafty, geekbench, graphics-magick, hmmer, openssl and unixbench). I created a sample mtCCU metric using only those benchmarks with the resulting mtCCU calculation table (900 pts max):

    26 ECUs (m2.4xlarge) = 851
    13 ECUs (m2.2xlarge) = 707
    6.5 ECUs (m2.xlarge) = 529
    4 ECUs (m1.large) = 385
    1 ECU (m1.small) = 110

    We'll make the mtCCU metric available soon.

    > Graphs: we'll be adding graphing functionality to the website soon

    > Sharing data: We'll be making all of the data (raw benchmark scores and aggregate metrics) available via web services and on our website soon. The web services will enable you to create your own CCU metric using whatever combination of benchmarks and baseline calculation you desire

    > Rimu Hosting: sorry about that, it has been corrected

    ReplyDelete
  7. You should benchmark gigenetcloud too.
    its one of the fastest clouds there is.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. GigeNET has been added to our site and will be included in the next round of benchmarks

    ReplyDelete
  10. would like to add Joyent, but have not been able to get a response

    ReplyDelete
  11. Thanks for this benchmark. It's very useful for us.
    Are you planning to get a benchmark of "CloudShare"?

    ReplyDelete
  12. Where did you find the hardware being run behind the various instance types on Amazon EC2?

    The hardware for the High Memory Instances is given on Amazon's Instance Types page, but I don't see anywhere that the other instances are actually given an actual hardware correlation (like AMD Opteron 2218 2.6 GHz or Xeon E5430 "Harpertown" 2.66 GHz).

    Thanks for your help.

    ReplyDelete
  13. Welcome to RSA performance.com your source for the lowest price on performance parts. We authorize dealer for Tein, HKS, Greddy, Skunk2, Ark performance, Tanabe, Ksport, Weapon R, Drag wheels and more. We offer great service and gaurantee the lowest price.



    RSAperformance

    ReplyDelete
  14. If I understand the Rackspace graph correctly, your benchmark processes never saw the benefit of any "bursting" which Rackspace says will be provided "at times when there's extra CPU power available from the host hardware".

    ReplyDelete
  15. "3. The performance increase between m2.4xlarge (13 ECU) and m2.2xlarge (26 ECU) was minimal (15-20% higher based on CCU)"

    I think you have the ECU 13 / 26 mixed up here.

    ReplyDelete