Wednesday, September 8, 2010

Benchmarking of EC2's new Cluster Compute Instance Type

Two months ago Amazon Web Services released a new "Cluster Compute" EC2 instance type, the cc.4xlarge. This new instance type is targeted for High-Performance Computing (HPC) such as computationally intensive scientific applications. The major differences between this and other EC2 instance types are:
  • Dual quad core "Nehalem" X5570 2.93 processors: compared with X5550 2.67 processors for the next largest m2 instance types. Amazon states this CPU configuration provides 33.5 ECU (EC2 Compute Units) compared with 26 ECUs for their m2.4xlarge instance type (previously the largest instance type)
  • Hardware-Assisted Virtualization (HVM): compared with paravirtualization used by other instance types
  • Multi-node 10 Gbps clustering capabilities: instances can be deployed to separate "Placement Groups" wherein each such group has non-blocking, low latency 10 Gbps network connectivity
Previously, we published 5 blog posts on cloud server performance which did not include this new EC2 instance type (cc.4xlarge) including:
The purpose of this post is to highlight the new EC2 Cluster Compute instance type in the context of these benchmarks and how it performs relative to the other EC2 instance types and servers in other IaaS clouds. For specifics on how the benchmarks are conducted and scores calculated, review the previous blog posts linked above. The benchmarks were performed on an individual cc.4xlarge instance and measure performance of a single instance only. The most beneficial feature of this new instance type is the clustering capabilities via 10 Gbps non-blocking network, which is not highlighted in this post.

The new cluster compute instance type is currently only available in Amazon's US-East region. The benchmark results tables below show only EC2 instances from that same region. NOTE: Although the EC2 documentation states that the cluster compute instance is assigned 2 quad core processors (8 cores total), the processors' hyper-threading capabilities resulted in benchmarks reporting 16 total cores.

CPU Performance
As described in the original post, we calculated CPU performance using a metric we created called the CCU. This metric is based on Amazon's ECU. Amazon states that the new cluster compute instance type should provide 33.5 ECUs. This is fairly close to our calculated 36.85 ECUs. Overall, CPU performance was exceptionally good, exceeding the performance of 134 cloud servers in 28 IaaS clouds from the previous post with the exception of the Storm on Demand 48GB X5650 Westmere cloud server which scored 42.87.

ID CPU Mem Price Geekbench Unixbench CCUs
cc.4xlarge Xeon X5570 2.93 [16 cores] 23 $1.6/hr 12306 1044.3 36.85
m2.4xlarge Xeon X5550 2.67 [8 cores] 68.4 $2/hr 5877 1511 30.72
m2.2xlarge Xeon X5550 2.67 [4 cores] 34.2 $1/hr 5163 1332 25.81
c1.xlarge Xeon E5410 2.33 7 $0.68/hr 5118 780 10.66
m2.xlarge Xeon X5550 2.67 17.1 $0.5/hr 3952 935.8 6.5
m1.xlarge Xeon E5430 2.66 15 $0.68/hr 4256 938.6 5.15
m1.large Xeon E5430 2.66 7.5 $0.34/hr 3092 663.4 4.17
c1.medium Xeon E5410 2.33 1.7 $0.17/hr 2680 758.4 3.49
m1.small Opteron 2218 2.60 1.7 $0.085/hr 1726 179.7 0.9

Disk IO Performance
Disk IO performance was likewise very good. The score of 104.42 signifies that it performed better than the baseline system for this benchmark, a "bare-metal" server running 4 x 15K RPM SAS drives configured with hardware Raid 1+0. For more information, review the previous post. Cloud server storage comes in both local and external storage flavors. External storage provides generally higher reliability (High Availability is only possible with external storage), while local storage provides generally better performance. With one exception (a Terremark vCloud Express Cloud Server), disk IO performance for the new cluster compute instance type was better than any other "external storage" cloud servers including any of EC2 existing instance types.

IDCPUMem
Price (USD)IOP
cc.4xlargeXeon X5570 2.93 [16 cores]23
$1.6/hr104.42
m2.2xlarge
m2.4xlarge
Xeon X550 2.67 [4 cores]
Xeon X5550 2.67 [8 cores]
34.2
68.4

$1/hr
$2/hr
96.22
87.56
m2.xlargeXeon X5550 2.67 [2 cores]17.1
$0.5/hr86.37
c1.xlargeXeon E5410 2.33 [8 cores]7
$0.68/hr74.29
m1.xlargeXeon E5430 2.66 [4 cores]15
$0.68/hr57.34
m1.largeXeon E5430 2.66 [2 cores]7.5
$0.34/hr54.29
c1.mediumXeon E5410 2.33 [2 cores]1.7
$0.17/hr35.76
m1.smallOpteron 2218 HE 2.60 [1 core]1.7
$0.085/hr25.34



Interpreted Programming Language Performance (Java, Python, Ruby, PHP)
In this benchmark category the new cluster compute instance type really shined. It performed significantly better than any of the other 134 cloud servers benchmarked in the previous post. The previous top performers were the Ec2 m2.4xlarge instance with a score of about 139, followed by the Storm on Demand 48GB Westmere cloud server with a score of 124.

ID CPU Memory PHP Python Ruby SPECjvm Score
cc.4xlarge Xeon X5570 2.93 [16 cores] 23 55383 3430 2.84 212.97 159.47
m2.4xlarge Xeon X5550 2.67 [8 cores] 68.4 50328 3725 4.12 197.42 138.58
m2.2xlarge Xeon X5550 2.67 [4 cores] 34.2 50253 3737 4.08 108.09 115.45
m2.xlarge Xeon X5550 2.67 [2 cores] 17.1 50774 3743 3.56 58.73 106.49
c1.xlarge Xeon E5410 2.33 [8 cores] 7 44460 4586 5.46 131.12 105.76
m1.xlarge Xeon E5430 2.66 [4 cores] 15 38737 5279 6.45 68.09 79.62
m1.large Xeon E5430 2.66 [2 cores] 7.5 38625 5324 6.58 38.21 71.29


Memory IO Performance
Memory IO performance was also exceptional for the new cluster compute instance. Its score of 130.6 was the second highest of the 134 cloud servers included in the previous post. Only the 48GB Storm on Demand Westmere server was higher with a score of 136.4.

IDCPUMemoryRedisMIOP
cc.4xlargeXeon X5570 2.93 [16 cores]2358909.51130.6
m2.4xlargeXeon X5550 2.67 [8 cores]68.444330.61117.88
m2.2xlargeXeon X5550 2.67 [4 cores]34.253150.28113.04
m2.xlargeXeon X5550 2.67 [2 cores]17.142735.76104.94
c1.mediumXeon E5410 2.33 [2 cores]1.715766.0968.98
m1.xlargeXeon E5430 2.66 [4 cores]1527314.7663.67
m1.largeXeon E5430 2.66 [2 cores]7.528656.1457.05




Encoding & Encryption Performance
In this benchmark category, the new cluster compute instance type scored second highest again out of the 134 cloud servers benchmarked in the previous post.

ID CPU Memory Encode
cc.4xlarge Xeon X5570 2.93 [16 cores] 23 148.38
m2.2xlarge Xeon X5550 2.67 [4 cores] 34.2 136.32
m2.4xlarge Xeon X5550 2.67 [8 cores] 68.4 136.12
m2.xlarge Xeon X5550 2.67 [2 cores] 17.1 136.09
c1.xlarge Xeon E5410 2.33 [8 cores] 7 119.77
m1.xlarge Xeon E5430 2.66 [4 cores] 15 103.33
m1.large Xeon E5430 2.66 [2 cores] 7.5 103.06
c1.medium Xeon E5410 2.33 [2 cores] 1.7 100.86
m1.small Opteron 2218 HE 2.60 [1 core] 1.7 43.25

Conclusion
The new EC2 cluster compute instance type is an excellent performing cloud server. Performance exceeded that of most of the "bare metal" cloud servers we benchmarked previously. Combined with 10 Gbps non-blocking clustering capabilities, and on-demand deployment & hourly billing, this new instance type provides exceptional value and capabilities for HPC applications.