Sunday, October 24, 2010

Introducing Web Services for Cloud Performance Metrics

Over the past year we've amassed a large repository of cloud benchmarks and metrics. Today, we are making most of that data available via web services. This data includes the following:

  • Available Public Clouds: What public clouds are around and which cloud services they offer including:
    • Cloud Servers/IaaS: e.g. EC2, GoGrid
    • Cloud Storage: e.g. S3, Google Storage
    • Content Delivery Networks/CDNs: e.g. Akamai, MaxCDN, Edgecast
    • Cloud Platforms: e.g. Google AppEngine, Microsoft Azure, Heroku
    • Cloud Databases: e.g. SimpleDB, SQL Azure
    • Cloud Messaging: e.g. Amazon SQS, Azure Message Queue
  • Cloud Servers: What instance sizes, server configurations and pricing are offered by public clouds. For example, Amazon's EC2 comes in 10 different instances sizes ranging from micro to 4xlarge. Our cloud servers pricing data includes typical hourly, daily, monthly pricing as well as complex pricing models such as spot pricing (dynamically updated) and reserve pricing where applicable
  • Cloud Benchmark Catalog: This includes names, descriptions and links to the benchmarks we run. Our benchmarks cover both system and network performance metrics
  • Cloud Benchmark Results: Access to our repository of 6.5 million benchmarks including advanced filtering, aggregation and comparisons. We are continually conducting benchmarks so this data is constantly being updated

We are releasing this data in hopes of improving transparency and making the comparison of cloud services easier. There are many ways that this data might be used. In this post, we'll go through a few examples to get you started and let you take it from there.

Our web services API provides both RESTful (HTTP query request and JSON or XML response) and SOAP interfaces. The API documentation and SOAP WSDLs are published here: http://api.cloudharmony.com/api

The sections below are separated into individual examples. This is not intended to be comprehensive documentation for the web services, but rather a starting point and a reference for using them. More comprehensive technical documentation is provided for each web service on our website.

Example 1: Lookup Available Clouds

In this first example we'll use the getClouds web service to lookup all available public clouds. The table at the top of the web service documentation describes the data structure that is used by this web service call.

Request URI: JSON Response

Request URI: XML Response

Note on pagination

Due to the large amount of data that can be returned, our web services utilize results pagination (similar to to getting multiple pages of results for a web search). The maximum number of results request to this web service will return is 10. You may set a limit lower than 10 using the ws-limit request parameter, but not greater than 10. The example request URIs above return only the first 10 results (as determined by the limit response value). At the time of this writing there were 37 total records (as determined by the count response value). To return the remaining 27 results, utilize the following URIs:

Request URI: Results 11-20

Request URI: Results 21-30

Request URI: Results 31+

Note on SOAP

In this post we'll only be showing use of the RESTful API interface. A SOAP interface is also provided. The base API documentation page includes links to WSDLs you may use to import and utilize the SOAP interface (some IDEs let you import WSDLs). The parameters and response structure for SOAP requests are very similar, but not identical to the REST interface (XML names may differ slightly from HTTP request and response names). The WSDL for the getClouds web service is available here: http://api.cloudharmony.com/getClouds/wsdl

Example 2: Search for cloud with servers, storage and CDN services

In this Example you'll use the same getClouds web service again, but we'll add a few constraints so only clouds with server, storage and CDN services are returned. Unless otherwise stated, from this point forward the same rules with regards to pagination apply (see Example #1 for more details). Additionally, only JSON URIs will be shown (to use XML responses, simply add the parameter ws-format=xml to the URI).

Request URI

Note on constraints

In the example URI above, we used ws-constraint parameters to filter the results. Constraints can be applied to specific attributes defined by the data structure for a given web service. The data structure is documented in a table at the top of the web services documentation page. In this example, we used 3 such attributes: hasServers, hasStorage, and hasContentDelivery. Because these are boolean type attributes, we assigned constraint values 1, signifying that only clouds where those attributes are TRUE should be returned.

The API also supports more complex constraint parameters. This example utilizes the most simple form of constraints by testing for equality and joining the 3 constraints with an AND connective. Constraints can also be used to check that an attribute is less or greater than a desired value, and use of an OR connective if multiple constraints are specified. We'll go into these types of constraints in a proceeding example.

Response

In this Example, only 5 public clouds are returned instead of the 37 returned in the previous example, signifying that only 5 public clouds offer all cloud servers, storage and CDN services.

Example 3: Retrieve a specific public cloud

Each data structure has an attribute that is the unique identifier. This attribute is called the primary key. The getClouds web service can be used to retrieve a specific public cloud if you know the primary key for the cloud. In this example, we'll use this feature to retrieve the AWS (Amazon Web Services) public cloud.

Request URI

Response

The response is almost identical to the previous 2 requests with the exception that the base response value is not an array. When a web service is invoked for a specific cloud using the primary key as we've done here, the response will always be a single data structure value. This is in contrast to the previous 2 requests that returned multiple clouds using an array as the base data structure.

Example 4: Retrieve Cloud Server Service for AWS

In this Example we'll use the getCloudServerServices web service to return the cloud server/IaaS service for AWS (Amazon Web Services) - EC2. We know AWS has such a service because the boolean hasServers attribute was true in the previous examples. The API documentation shows that the CH_CloudServer data structure contains an attribute named cloud that references the cloud that the service belongs to. In order for this web service to return only the cloud server service belonging to AWS, we'll just need to add a single ws-constraint for this attribute.

Request URI

Response

Even though only a single result for EC2 was returned, the base response data structure is an array. This will always be the case when invoking the API for a data structure without a primary key (as we did in Example 4 above), because when this is the case, there is always the possibility that multiple results could be returned.

Example 5: Find all cloud services that support Windows

Suppose you are looking to deploy a Windows server in the cloud. Because the CH_CloudServer data structure has an attribute operatingSystemsSupported that defines which operating systems are supported by that service, we can use it in conjunction with a ws-constraint request parameter to filter the results accordingly. In our previous use of constraints, we used the default equality operator. In this example, we'll need to change the operator to a substring search. This is because the operatingSystemsSupported attribute is an array which may contain multiple values representing all of the operating systems supported by the service (i.e. Linux, Windows, etc.). By using the substring operator, the request will search for services where the operatingSystemsSupported attribute contains Windows. The substring operator is the numeric value 32 (operators are numeric to support multiple operators using bitmasks). The operators supported and their corresponding values are shown on the API documentation.

Request URI

Response

At the time of this writing, this request returned 14 services that support the Windows operating system.

Example 6: Find other cloud services

In addition to the getCloudServerServices web service discussed in Examples 4-5, the following additional web services are provided: getCloudDatabaseServices, getCloudMessagingServices, getCloudPlatformServices,getCloudStorageServices and getCDNs. The usage for each of these is identical. In the example below we'll use them for various lookups.

We are still in the process of populating vendor profiles for different cloud services. Currently only basic information is provided by the web services. In the future, the data structures for these services will be expanded to include many additional details such as pricing, SLAs, features, technical details, etc.

Lookup all CDNs

Lookup storage services for AWS

Lookup database services for Azure

Lookup all cloud platforms (i.e. Google AppEngine, Microsoft Azure)

Lookup all cloud messaging services (i.e. AWS SQS)

Example 7: Get the full benchmark catalog

Up to this point we've been using web services to lookup which clouds and cloud services are available. The remaining Examples will involve retrieving benchmarking related data. To get started, we'll first need to determine which benchmarks are available using the getBenchmarks web service. This web service provides access to information about the benchmarks we conduct. Unlike the previous web services, getBenchmarks does not support the ws-constraint parameters to filter results. This is always the case when the top section of the API documentation page does not show a data structure table. Instead of ws-constraint filters, getBenchmarks supports 4 request parameters (these are shown on the right column of the API documentation table):

  • aggregateOnly: set to TRUE if only aggregate benchmarks should be returned (see not on aggregate benchmarks below)
  • nonAggregateOnly: set to TRUE if only non-aggregate benchmarks should be returned (see not on aggregate benchmarks below)
  • category: return only benchmarks in this category. Multiple categories may be specified separated by pipe characters (see Example 8 below)
  • serverOnly: set to TRUE if only cloud server benchmarks should be returned. These are benchmarks we run on cloud servers only

Benchmarks are assigned to 1 or more categories. The getBenchmarkCategories web service may be used to obtain all of the available benchmark categories (see Example 8 below).

In this example, we'll retrieve all benchmarks (or at least the first 10 due to pagination).

Request URI: Results 1-10

Request URI: Results 11-20

Response

The response from this web service is an array of benchmarks each containing the follow values:

  • benchmarkId: the identifier of the benchmark
  • title: the benchmark title
  • subtitle: the benchmark subtitle
  • categories: the categories for this benchmark (an array)
  • description: the benchmark description
  • url: URL to this benchmark's website (if available)
  • lowerIsBetter TRUE if a lower score is better for this benchmark
  • aggregate: TRUE if this benchmark is an aggregate of multiple benchmarks (see not on aggregate benchmarks below)
  • benchmarks: if this is an aggregate benchmark, this value will provide details about which individual benchmarks are included in it and their corresponding weights. This return value is an array of values each containing the following keys:
    • benchmarkId: the id of the individual benchmark
    • weight: the weight assigned to this benchmark
    • alternates: if this benchmark has alternate benchmarks (benchmarks used if this benchmark is not available), this will be an array representing the IDs of those alternate benchmarks
  • baseline: if this is an aggregate benchmark, this return value will be provided defining the baseline definition for it. The baseline definition determines how the aggregate benchmark metric is calculated. This response value is an array containing one or more server/value pairs. The aggregate score is based on determining how the benchmarked server performs relative to the baseline servers. If the server performs better, the metric will be higher than the baseline. If the server performed worse, the metric will be lower. This response is a hash of key/value pairs where the key is the serverId and the value is the score that is assigned if the benchmarked server performed exactly the same as that server. For more information on baselines and aggregate metric calculation is available on the What is an ECU? CPU Benchmarking in the Cloud post on our blog

Note on aggregate benchmarks

Aggregate benchmarks are a special type of benchmark that aren't benchmarks themselves, but rather a compilation of multiple benchmark result metrics. This compilation is used in conjunction with a baseline configuration to produce a more comprehensive benchmark metric related to some facet of performance. A more detailed description of aggregate benchmarks and baselines is discussed on our blog. CCU is one such aggregate benchmark discussed here.

Example 8: Get benchmark categories

Every benchmark is assigned to one or more categories. The getBenchmarkCategories web service returns a list of all possible benchmark category name. This web service is very simple. It does not use any parameters or pagination.

Request URI

Example 9: Get only server benchmarks in category System: CPU

In this example we'll use the same getBenchmarks web service to retrieve only server benchmarks in the category System: CPU (we discovered this category previously using the getBenchmarkCategories web service). To accomplish this, we'll use the serverOnly and category request parameters.

Request URI

Example 10: What server benchmarks have been run

Before attempting to analyze benchmark results, it may be helpful to first determine what benchmark results data is available including which clouds and server configurations have been benchmarked. Generally, we conducted cloud server benchmarking 3-4 times each year. Every benchmark test run has a unique testId. The typical format of a testId is MMYY-[SEQ]. For example, the test 0410-1 was conducted in April 2010. Do determine what tests have been run within clouds the getServerCloudsBenchmarked web service may be used. This web service uses the following parameters:

  • serviceId: the ID of a service or cloud to return test information for. Multiple IDs may be specified each separated by a pipe character
  • start: if specified, only services that have been benchmarked on or after this date will be returned
  • stop: if specified, only services that have been benchmarked on or before this data will be returned

The return value is an array of cloud server services and the corresponding testing information for those services including testIds and testing dates.

Request URI

Response

The response from this web service is an array of services and information about the benchmark tests that have been conducted within those services.

  • id: the id of this service
  • name: the name of the service
  • testIds: the IDs of tests performed (array)
  • numTests: the number of tests that have been conducted for this service (same as number of elements in testIds)
  • lastTestId: the ID of the last test that was run for this service
  • lastTestDate: the date of the last test that was run for this service
  • url: the URL to the service's website

Example 11: What server benchmarks have been run in the GoGrid and Amazon clouds after June 2010

In this example, we'll use the same getServerCloudsBenchmarked web service to determine when testing has occurred only in the AWS and GoGrid clouds on or after June 2010. To do so, we'll use the serviceId and start parameters to filter the results. The serviceId parameter can be either the ID of the specific server service or the ID of a cloud.

Note on dates and times

Whe specifying dates or dates and times, most standard formats are supported such as 6/1/2010 or 2010-06-01 or June 1 2010. Date data types are returned by web services as a text value unless the ws-js-dates parameter is set to TRUE in which case it will be returned using a javascript Date object (only applicable to JSON responses).

Request URI

Example 12: Get all Geekbench benchmark results for Rackspace Cloud and GoGrid

When it comes down to retrieving cloud server benchmark metrics we'll use the getServerBenchmarkResults web service. This web service requires 2 parameters:

  • benchmarkId: the identifier(s) of the benchmarks that should be returned (REQUIRED). Multiple IDs may be specified separated by pipe characters
  • serviceId: the identifier(s) of the cloud server service to return the benchmarks for (REQUIRED). Multiple IDs may be specified separated by pipe characters. Alternatively, this parameter may be left out if the serverId parameter below is specified

Additionally, the following parameters may optionally be provided:

  • serverId: the identifier(s) of the server to return benchmark metrics for. Multiple IDs may be specified each separated by a pipe character
  • dataCenter: the identifier(s) of a specific service data center to return benchmarks for (if the cloud server service operates out of multiple data centers). Multiple data centers may be specified separated by pipe characters. For example, AWS EC2 operates out of 4 regions - US West, US East, EU West and APAC currently. These regions are located in California, Virginia, Ireland and Singapore respectively. To return only results for the US West data center, this parameter should be set to CA, US. To return metrics for both US West and EU West data centers, this parameter would be CA, US|IE (IE is the ISO 3166 code for Ireland)
  • testId: the identifier of a specific test for the benchmarks that should be returned. Multiple IDs may be specified separated by pipe characters
  • lastBenchmarksOnly: set to TRUE if only the latest benchmark test should be included in the results. This guarantees that only a single set of results will be returned
  • combineMultiple: If multiple benchmark metrics are included in the results, this parameter defines how those values should be returned as a single value. Valid options are:
    • average: use an average of all values (default)
    • lowest: return the worst value (may be the lowest or highest value depending on whether higher or lower scores are better for the benchmark)
    • highest: return the best value
    • earliest: return the value from the earliest test
    • latest: return the value from the latest test

As you can see, requests using this web service can be quite complex if desired. In this example, we'll keep it simple by using only the benchmarkId and serviceId parameters. The Geekbench benchmark produces a metric that rates CPU and memory performance

Request URI

Response

The response is an array of benchmark result metrics each consisting of the following values:

  • serverId: the ID of the server this metric pertains to
  • serviceId: the ID of the service serverId pertains to
  • benchmarkId: the ID of the benchmark this metric pertains to
  • value: the benchmark metric. If this result consists of multiple benchmark values, value will be an average of all results unless the combineMultiple request parameter specifies otherwise
  • testDate: the date of the test (if result is from a single benchmark test)
  • testId: the ID of the test (if result is from a single benchmark test)
  • resultsUrl: some benchmark result artifacts are accessible online. When this is the case, this will be the URL to those artifacts (if result is from a single benchmark test)
  • values: the values of the tests (if result is from multiple benchmark tests)
  • testDates: the dates of the tests (if result is from multiple benchmark tests)
  • testIds: the ID of the tests (if result is from multiple benchmark tests)
  • resultsUrls: the URLs to the test results (if result is from multiple benchmark tests)
  • numTests: the number of tests used to calculate the value

Example 13: Get the latest CCU benchmarks for the EC2 APAC region

In this example, we'll use the dataCenter and lastBenchmarksOnly parameters to return all of the CCU benchmark results for Amazon EC2's APAC region (this region is located in Singapore - hence the dataCenter parameter is set to the ISO 3166 country code SG). Unlike the previous example where multiple test results were returned, in this example because lastBenchmarkOnly is TRUE, the web service will only return a single benchmark value (the values, testDates, testIds and resultsUrls values will not be included in the response). CCU is an aggregate benchmark consisting of many underlying CPU performance related benchmarks as discussed here.

Request URI

Example 14: Get available cloud server configurations for the Amazon EC2 APAC region

Before proceeding any further with getServerBenchmarkResults examples, we'll demonstrate how to find out what server configurations are available for a given cloud service. This is useful because the getServerBenchmarkResults supports a serverId parameter that can be used to filter benchmark results using a specific server identifier. For example, you may want to compare benchmark results between EC2 m2.4xlarge and GoGrid 16GB cloud servers only.

The getCloudServerConfigurations web service allows you to lookup cloud server configurations. This web service uses a data structure containing various details about cloud servers including CPU, memory, and storage specifications; pricing and more (review the API documentation for full details). Because this web service is based on a data structure, we'll be able to use ws-constraint parameters to filter the results. In this example, we'll use 2 constraints (cloud and dataCenter) to filter the results so that only Amazon EC2 APAC region servers are returned.

Request URI

Example 15: Compare IOP benchmark results between Rackspace Cloud and GoGrid 4GB cloud servers

Now that we've been able to obtain the identifiers of cloud servers using the getCloudServerConfigurations web service, we can go back to the getServerBenchmarkResults web service and compare cloud servers using those IDs and the serverId parameter. In this example, we'll compare storage IO performance between 4GB Rackspace Cloud and GoGrid cloud servers (gg-4gb and rs-4gb) using the aggregate IOP benchmark. IOP is an an aggregate storage IO benchmark based on 7 IO related benchmarks as documented here. This is benchmark is NOT the same as IOPS. To invoke retrieve the IOP benchmark results for only Rackspace and GoGrid 4GB cloud servers, we'll set the serverId parameter to gg-4gb|rs-4gb (multiple IDs can be specified each separated by a pipe character).

Request URI

Example 16: Lookup all cloud servers in the US with at least 2GB memory and costing $0.10/hr or less

In this example, we'll use the getCloudServerConfigurations to lookup US-based cloud services offering cloud servers with at least 2GB memory and costing $0.10/hr or less. This will involve use of 4 filtering constraints: dataCenter, memory, priceHourly and priceCurrency. In order to apply these constraints, we'll first need to determine what operators should be used.

The dataCenter attribute value is either [state/province], [country] (US or Canada only) OR [country]. Thus, we'll want the dataCenter attribute to "end with" "US". According to the API documentation, the "ends with" operator is 16.

The memory attribute is a numeric value representing the # of gigabytes included with a cloud server. We'll want this attribute to be equal to or greater than 2. The operator for "equal to" is 1. The operator for "greater than" is 2. Thus, an "equal to or greater than" operator is 1+2=3 (bitmask addition).

The priceHourly attribute is also numeric representing the price of the server per hour. We'll want this attribute to be equal to or less than 0.10. The operator for "equal to" is 1, and the operator for "less than" is 4. Thus, an "equal to or less than" operator is 1+4=5.

The priceCurrency attribute is a string representing the currency code for pricing defined in the server configuration (USD = US dollar). Thus we want this attribute to be equal to "USD". Equality is the default operator, so we do not need to provide an operator value for this constraint.

Request URI

Response

At the time of this writing, only gigenet cloud offers a cloud server with these specifications.

Example 17: Determine average uplink throughput from GoGrid US West to Amazon S3 US West, Zetta and Google Storage

In addition to system benchmarks, we also continually collect networking benchmark metrics. These include both throughput and latency metrics within clouds, between clouds, and from clouds to consumer (i.e. residential Internet connections such as DSL and cable to various cloud services).

Suppose you are evaluating cloud services and decide to use GoGrids' cloud servers. Your business and customers are in California, so you opt to use GoGrid's US West data center. For added protection against a large scale failure, you decide to use an external storage service for backups (instead of GoGrid's own storage service). You've narrowed your storage choices down to either Amazon S3, Zetta or Google's Storage for Developers. You'd like to know which of these storage services will provide the fastest uplink throughput from your cloud servers at GoGrid in order to ensure that backups can be uploaded as quickly as possible. The getNetworkBenchmarkResults web service provides access to this sort of data. This web service uses the following parameters:

Request Parameters

  • serviceId: the ID of the service to get network benchmark metrics for (REQUIRED). Multiple IDs can be specified each separated by a pipe character. Adding multiple IDs each ID specified will essentially double execution time for this web service, so use this feature sparingly
  • dataCenter: if the serviceId specified is operated out of multiple data centers, this parameter must also be provided defining the location of the data center to return results for. Multiple data centers may be specified each separated by a pipe character. Adding multiple data centers will essentially double execution time so use this feature sparingly
  • testId: we conducted multiple network performance tests. This parameter should be the name of the network test that results should be returned for. The following network tests are currently used:
    • intercloud: results originate from our intracloud/intercloud network performance tests. These tests are run throughout the day at varying times to test throughput and latency between and within cloud services (DEFAULT)
    • speedtest: get results from our browser-based cloud speedtest. We allow Internet users to run this test for free. We also pay about 1000 users each month to run this test using Amazon's Mechanical Turk. When the speedtest is run, we capture the user's location (city, state, country), ISP and connection speed (netspeed) using MaxMind's GeoIP databases. Users select a test file between 1-5 MB to test download throughput, or a test file between 0.5-2.5MB to test upload throughput. User's may also test latency
    • speedtest-web: this is the same as speedtest, except that instead of downloading a single large file, many small files are downloaded to simulate an actual web page load. Users select a small (10 files), medium (19 files) or large (51 files) website to test
  • endpoint_*: the endpoint parameters are used to define a service, region, location (city, state or country), ISP or netspeed (or some combination of those) for which network benchmark metrics should be returned. At least 1 endpoint parameter must be specified. The endpoint parameters used must correspond with the testId specified. The following endpoint parameters are allowed:
    • endpoint_cloudId: return results for all services in this cloud. This parameter applies only to the intercloud testId and cannot be used in conjunction with any other endpoint parameters except for endpoint_dataCenter. Multiple IDs may be specified each separated by a pipe character. Results will be grouped by service and data center (multiple results possible for each service and data center)
    • endpoint_serviceId: return results for a specific cloud service. This parameter applies only to the intercloud testId and cannot be used in conjunction with any other endpoint parameters except for endpoint_dataCenter. Multiple IDs may be specified each separated by a pipe character. Results will be grouped by service and data center (multiple results possible for each data center)
    • endpoint_dataCenter: if the endpoint_cloudId or endpoint_serviceId parameters are specified and services are operated out of multiple data centers, this parameter may be used to limit the results to a specific data center location. Multiple data center locations may be specified each separated by a pipe character
    • endpoint_region: a specific region identifier to return results for. This parameter may be used for any test type (speedtest, speedtest-web or intercloud) but may not be used in conjunction with any other endpoint parameters except for endpoint_isp and endpoint_netspeed. Region identifiers and configurations are available using the getRegions web service. Only a single region may be specified, and results are grouped by region (single result for each invocation unless endpoint_isp or endpoint_netspeed are also specified)
    • endpoint_city: a specific city to return benchmark results for. If used, endpoint_country MUST also be specified. This parameter applies only to speedtest or speedtest-web tests. This parameter is not case sensitive. Multiple cities may be specified each separated by a pipe character. Set this parameter to the wildcard character * to return results for all cities for the endpoint_state (optional) and endpoint_country specified. Results will be grouped by city (multiple results possible for each city specified)
    • endpoint_state: a specific state or province to return benchmark results for. If used, endpoint_country will be automatically determined if not specified (US for US states, CA for Canadian provinces). This parameter applies only to speedtest or speedtest-web tests. This parameter can only be used for US and Canada test results because the GeoIP database only supports state/provinces in those countries. It should be the 2 character code for the state or province and is not case sensitive (i.e. NY, CA or QC). Multiple states/provinces may be specified each separated by a pipe character. Set this parameter to the wildcard character * to return results for all states/provinces for the endpoint_country specified. Results will be grouped by state (multiple results possible for each state). If used in conjunction with the endpoint_city parameter, results will be grouped by city
    • endpoint_country: a specific country to return benchmark results for. This parameter applies only to speedtest or speedtest-web tests only. May be used in conjunction with the endpoint_city and endpoint_state parameters. It should be the 2 character ISO 3166 code for the country and is not case sensitive (i.e. US, CA or FR). Multiple countries may be specified each separated by a pipe character. Set this parameter to the wildcard character * to return results for all countries. Results will be grouped by country (multiple results possible for each country) unless endpoint_city or endpoint_state are also specified in which case results will be grouped according to those parameters
    • endpoint_isp: the name of a specific ISP to return benchmark results for. This parameter may be used alone or in conjunction with the endpoint_region, endpoint_city, endpoint_state or endpoint_country parameters. This parameter applies to speedtest or speedtest-web tests only. This parameter is not case sensitive and can also be a substring match to the ISP name (e.g. Verizon will return multiple results for Verizon Business,Verizon Internet Services and Verizon Australia PTY Limited). Multiple ISPs may be specified each separated by a pipe character. Set this parameter to the wildcard character * to return results for all ISPs. The getSpeedtestIsps web service may be used to obtain the names of ISPs for which results are available. If this parameter is specified, results will be grouped by ISP in addition to existing grouping. For example, if the endpoint_city parameter was also specified, the results will be grouped by ISP and then city. This parameter may NOT be used in conjunction with the endpoint_netspeed parameter
    • endpoint_netspeed: a specific connection type to filter results on. This parameter should be one of the following:
      • cabledsl
      • corporate
      • dialup
      • unknown

      A majority of our speedtest results are of type cabledsl. More information on how netspeed is determined is available here. Multiple netspeeds may be specified each separated by a pipe character. Set this parameter to the wildcard character * to return results for all connection speeds. If this parameter is specified, results will be grouped by netspeed in addition to existing grouping. For example, if the endpoint_city parameter was also specified, the results will be grouped by netspeed and then city. This parameter may not be used in conjunction with endpoint_isp
  • metric: the network benchmark metric to return. One of the following values:
    • downlink: the average downlink throughput measured in megabits per second (Mb/s) (DEFAULT)
    • uplink: the average uplink throughput measured in megabits per second (Mb/s)
    • latency: the average latency measured in milliseconds (ms)
  • start: only consider results from tests that occurred on or after this date
  • stop: only consider results from tests that occurred on or before this date
  • minNumTests: the minimum # of tests for a result to be included. The larger the number of tests in a result, the more reliable and accurate that metric will be
  • order: the ordering method, one of the following: asc: order results in ascending order; or desc: order results in descending order. The default ordering is descending for throughput and ascending or latency benchmark results

Constructing the Request

As you can see, the getNetworkBenchmarkResults web service support a complex array of parameters. For the purposes of this example, we'll only be using a few parameters:

  • serviceId: The serviceId we'll use is GoGrid:Servers which is the ID for the GoGrid server service
  • dataCenter: GoGrid currently operates out of 2 data centers, us-west and us-east. The us-west data center is located in California, so the dataCenter parameter we'll use is CA, US
  • testId: We are looking for results from the intercloud test. This is the default value for this parameter, so we do not need to include it in the request
  • endpoint_serviceId: We want to get throughput metrics for AWS S3, Zetta and Google Storage. This parameter supports multiple service IDs each separated by a pipe character. Thus, this parameter will be AWS:S3|Zetta:Storage|Google:Storage
  • endpoint_dataCenter: AWS, Zetta and Google all run storage services out of California. AWS also offers storage in Virginia, Ireland and Singapore. Since we do not want to include those data centers in the results, we'll set this parameter to CA, US
  • metric: Since we'll be doing a lot of uploading to the storage service, our primary are of concern is uplink throughput, so we'll set this parameter value to uplink (the default is downlink)

Request URI

Response

The API documentation states that the results will be an array of hashes each with the following possible values:

  • value: the average downlink (Mb/s), uplink (Mb/s) or latency (ms) for this network benchmark result. This value is based on the metric parameter specified (default is downlink)
  • originServiceId: The ID of the service this result originates from. Returned only when multiple serviceId parameters were specified
  • serviceId: The ID of the endpoint service this result pertains to. Returned if the endpoint_cloudId or endpoint_serviceId parameters were used
  • originDataCenter: The location of the data center this result originates from. Returned only if multiple dataCenter parameters were specified
  • dataCenter: The location of the endpoint data center this result pertains to. Returned if the endpoint_cloudId or endpoint_serviceId parameters were used
  • region: the geographical region this result pertains to. Returned if the endpoint_region, endpoint_city, endpoint_state or endpoint_country parameters were specified. For more information, see the API documentation for the getRegions web service
  • city: The name of the city this result pertains to. Returned if the endpoint_city parameter was specified. More information on how this data is obtained is available here
  • state: The 2 character identifier of the state or province this result pertains to. Only available for US or Canada based results. Returned if the endpoint_state parameter was specified. More information on how this data is obtained is available here
  • country: The 2 character ISO 3166 identifier of the country this result pertains to. Returned if the endpoint_country parameter was specified. More information on how this data is obtained is available here
  • isp: The name of the ISP this result pertains to. Returned if the endpoint_isp parameter was specified. More information on how this data is obtained is available here
  • netspeed: The connection speed used by the tester. Returned if the endpoint_netspeed parameter was specified. More information on how this data is obtained is available here
  • numTests: The number of tests that were averaged to produce this result
  • earliestTest: The date/time of the earliest test included in this result
  • latestTest: The date/time of the latest test included in this result

Because we are testing throughput from one cloud service to another, the results only include value, serviceId, dataCenter, numTests, earliestTest and latestTest. In the proceeding examples we'll see when the other response values are used. The results for this example at the time of this writing are:

  • AWS S3 US West: 161.22 Mb/s uplink (out of 212 tests)
  • Zetta: 63.1 Mb/s uplink (out of 213 tests)
  • Google Storage: 31.07 Mb/s uplink (out of 215 tests)

These results signify that AWS S3 US West region storage will generally provide the fastest uplink throughput from GoGrid US West cloud servers and may be the best service to use for backups (subject to other decision making criteria like price and support).

Example 18: Which CDN has the lowest latency in Europe

In the previous example we obtained network performance results based on our intercloud network testing. These tests are run periodically throughout the day to test throughput and latency between and within cloud services and Internet data centers. We also host a browser-based cloud speedtest to track throughput and latency between cloud services and primarily consumer-based high-speed Internet connections such as DSL and Cable. Users of the cloud speedtest select one or more cloud services to test, a test file size (1-5MB for download tests or 0.5-2.5MB for upload tests) and test to perform (uplink, downlink or latency). The speedtest then uploads/downloads the test file to/from the select cloud services and displays the latency and/or throughput results. We use MaxMind's GeoIP databases to track where the user is (city, state, country), the name of their ISP, and their connection speed using their IP address. This is a generally reliable method for obtaining this data with accuracy of about 99.8%. In addition to allowing Internet users to run this test for free, we also pay about 1000 users per month to run the test using Amazon's Mechanical Turk. All of these results are stored in our database and accessible through the getNetworkBenchmarkResults web service.

In this example, we want to find the CDN (Content Delivery Network) with the lowest throughput in Europe. We used the getRegionsweb service to discover that the region code for Europe is eu. CloudHarmony currently collects network benchmark metrics for about a dozen different CDNs. However, in this example, we've narrowed our CDN choices down to four: AWS CloudFront, MaxCDN, Edgecast or Akamai (resold by VPS.net). The request will be fairly simple, using the serviceId, testId, endpoint_region and metric parameters. The serviceId parameter supports multiple IDs each separated by a pipe character, so we will use that to specify the IDs of each of these 4 CDNs.

Request URI

Response

At the time of this writing, the results from these benchmarks were:

  • AWS CloudFront: 51.15ms (out of 191 tests)
  • MaxCDN: 56.65ms (out of 190 tests)
  • Edgecast: 51.38ms (out of 182 tests)
  • Akamai: 34.22ms (out of 195 tests)

So in this example, the clear winner was Akamai by a margin of about 35%. However, latency is not bad for any of these CDNs.

Example 19: What is the average downlink throughput for CDNs in California

In this example, we'll use the endpoint_isp and endpoint_state parameters to view performance of the Internap and AWS CloudFront CDNs in California, grouped by ISP. The endpoint_isp parameter can either be the name (or partial name) of an ISP such as Verizon, or a wildcard character * to indicate that all ISPs should be returned in the results. In this example, we'll use the wildcard option so the results are grouped by ISP. We will also use the minNumTests parameter so that only results with at least 5 tests completed are returned. The order=asc parameter is also used signifying that the slowest ISPs will show first in the results.

Request URI

Response

The response includes the average downlink throughput value, name of the isp, and the region identifier (us_west_pacific for all results in this example). Because more than 10 results are returned, we'll have to use the ws-offset=10 parameter to view the second page of results, ws-offset=20 for the third page and so on.

Example 20: Which CDN provides the best throughput in the APAC region

In this example, we'll determine which out of a handful of CDNs provides the best overall downlink throughput in the APAC region. We used the getRegions web service to discover that the region code for APAC is asia_apac. In this example, we'll evaluate Akamai, Edgecast, CloudFront, Microsoft Azure CDN and Limelight (resold by Rackspace Cloud).

Request URI

Response

At the time of this writing, the results from these benchmarks were:

  • Akamai: 2.69 Mb/s (out of 489 tests)
  • Edgecast: 2.51 Mb/s (out of 483 tests)
  • AWS CloudFront: 2.61 Mb/s (out of 493 tests)
  • Azure CDN: 3.29 Mb/s (out of 496 tests)
  • Limelight: 2.6 Mb/s (out of 483 tests)

So in this example it appears that Microsoft's Azure CDN service provides almost 20% better downlink throughput in APAC countries with almost 500 tests recorded.

Example 21: Which cloud server vendor has the best throughput in New York City

In this example, we'll use the endpoint_city parameter to determine which of a handful of cloud service provides has best downlink throughput in New York City. We will evaluate the following cloud server providers: AWS EC2 (US East region), GoGrid (US East region), Storm on Demand, Speedyrails (Quebec, CA), VoxCLOUD (New York) and Rackspace Cloud Servers (Texas data center). Because we are dealing with multiple services and multiple data centers, the serviceId and dataCenter parameters need to corresponding with the IDs of all 4 services and and data center locations. The web service will ignore data centers that are not valid for a given service (i.e. only Speedyrails has a data center in Quebec and only Voxel has a data center in New York).

Request URI

Response

At the time of this writing, the results from these benchmarks were:

  • AWS EC2 US East: 10.91 Mb/s (out of 45 tests)
  • GoGrid US East: 6.16 Mb/s (out of 12 tests)
  • Storm on Demand: 7.33 Mb/s (out of 39 tests)
  • Speedyrails: 10.39 Mb/s (out of 20 tests)
  • VoxCLOUD New York: 8.88 Mb/s (out of 36 tests)
  • Rackspace Cloud Servers - Chicago: 5.72 Mb/s (out of 65 tests)

So, with a limited number of test results (less than 100 results should not be considered to be reliable), AWS EC2 US East, Speedyrails and VoxCLOUD New York appear to provide the fastest downlink throughput to New York City (primarily consumer) Internet connections.

Conclusion

For now, we are offering free access to these web services for up to 10 requests per rolling 24-hour period. After 10 requests, you will receive a 503: Service Unavailable http response. This is a beta service and usage and terms are subject to change. If you would like an increased quota or professional support, please contact us. We'd of course also appreciate feedback and bug reports (send to info [at] cloudharmony.com).

Sunday, October 3, 2010

Cloudscaling & KT: Private cloud validation using benchmarking

A few months ago we were contacted by Cloudscaling CEO Randy Bias regarding our work in benchmarking of public IaaS clouds (see previous blog posts). His team was working on a large private cloud deployment for KT, Korea's largest landline and second largest mobile carrier, and was interested in using similar techniques to validate that private cloud. This validation would include not only raw benchmark results, but also comparisons of how the private cloud stacked up against existing public clouds such as EC2 and GoGrid. This data would be useful not only for Cloudscaling to validate their own work, but also as a reference for their client KT. We agreed to the project and benchmarking was conducted over a 4 day period last August. Our deliverables included raw benchmark data and an executive report highlighting the results. In this post we will provide the results of these benchmarks.

Multi-Tenancy and Load Simulation
Benchmarking of public IaaS clouds involves a certain amount of ambiguity due to the scheduling and allocation of resources in multi-tenant virtualized environments. One of the fundamental jobs of a hypervisors such as VMware and Xen is to allocate shared resources in a fair and consistent manner. In order to maximize performance and utilization, they are designed to allocate resources such as CPU and Disk IO using a combination of fixed and burstable methods. For example, when a VM requests CPU resources, the hypervisor will generally provide more resources when neighboring VMs are idle versus when they are also requesting CPU resources. In very busy environments, this often results in variable and inconsistent VM performance.

Because the KT cloud benchmarking was conducted pre-launch, there was no other load in the environment besides our benchmarking. To offset this, we ran the benchmarks twice. In the first run, the benchmarks were run individually to provide maximum performance. In the second run, we attempted to simulate a loaded environment by filling the cloud to about 70% capacity with VMs instructed to perform a random sample of load simulating benchmarks (using mostly non-synthetic benchmarks like tpcc, blogbench and pgbench). The benchmarks for the second run were conducted concurrently with the load simulation. The tables and graphs below provide the unloaded benchmark results. Differences between those and the loaded results are noted above the results.

Organization of Results
The results below are separated into 2 general VM types, a large (16 & 32GB) VM and small (2GB) VM. Comparative data is also shown from public clouds including
BlueLock, GoGrid, Amazon EC2, Terremark vCloud Express and Rackspace Cloud. We conducted similar benchmarking in these public clouds earlier this year. The results provided are based on 5 aggregate performance metrics we created and discussed in previous blog posts including:
Note on CPU Stats
The servers tested in all of these benchmarks run within virtualized environments. The cores shown in the benchmark tables below are the # of cores or vCPUs exposed to the virtual server by the hypervisor. This is often not the same as the # of physical cores available on the host system.

Benchmark Results

CPU Performance
CPU benchmark results during the loaded versus unloaded benchmark runs were roughly the equivalent.

Large Server

Cloud Server CPU Memory CCUs
BlueLock 16gb-8cpu Xeon X5550 2.67 GHz [8 cores] 16 GB 29.2
KT 32GB/6x2GHz Xeon L5640 2.27 GHz [6 cores] 32 GB 28.66
GoGrid 8gb Xeon E5450 2.99 GHz [6 cores] 8 GB 27.36
Amazon EC2 m2.2xlarge Xeon X5550 2.67 GHz [4 cores] 34.2 GB 25.81
KT 16GB/3x2GHz Xeon L5640 2.27 GHz [3 cores] 16 GB 15.27
Terremark 16gb-8vpu AMD 8389 2.91 GHz [8 cores] 16 GB 9.81
Amazon EC2 m2.xlarge Xeon X5550 2.67 GHz [2 cores] 17.1 GB 9.1
Rackspace Cloud 16gb AMD 2374 2.20 GHz [4 cores] 16 GB 5.1

CPU Performance
Small Server

Cloud Server CPU Memory CCUs
BlueLock 2gb Xeon X5550 2.67 GHz [1 core] 2 GB 6.37
KT 2GB/1x2GHz Xeon Xeon L5640 2.27 GHz [1 core] 2 GB 5.98
Terremark 2gb AMD 8389 2.91 GHz [1 core] 2 GB 5.57
Rackspace Cloud 2gb AMD 2374 2.20 GHz [4 cores] 2 GB 5.08
GoGrid 2gb Xeon E5520 2.27 GHz [2 cores] 2 GB 5.02
Amazon EC2 c1.medium Xeon E5410 2.33 GHz [2 cores] 1.7 GB 3.49


Disk IO Performance
Disk IO performance was 20-30% slower than shown below during the loaded benchmark run. The KT cloud uses external SAN storage for VM instance storage. This, combined with the fact that the load simulation benchmarks were fairly disk IO intensive (probably more so than an actual production environment), are likely the reason this. Despite this, disk IO performance was very good. It should be noted that GoGrid and Rackspace Cloud do not utilize external VM instance storage.

Large Server

Cloud Server CPU Memory IOP
KT 16GB/3x2GHz Xeon L5640 2.27 GHz [3 cores] 16 GB 127.05
KT 32GB/6x2GHz Xeon L5640 2.27 GHz [6 cores] 32 GB 125.31
GoGrid 8gb Xeon E5450 2.99 GHz [6 cores] 8 GB 122.62
Terremark 16gb-8vpu AMD 8389 2.91 GHz [8 cores] 16 GB 112.59
Rackspace 16gb AMD 2374 HE 2.20 GHz [4 cores] 16 GB 100.15
Amazon EC2 m2.2xlarge Xeon X5550 2.67 GHz [4 cores] 34.2 96.22
Amazon EC2 m2.xlarge Xeon X5550 2.67 GHz [2 cores] 17.1 87.87

Disk IO Performance
Small Server

Cloud Server CPU Memory IOP
GoGrid 2gb Xeon E5520 2.27 GHz [2 cores] 2 GB 143.35
KT 2gb Xeon L5640 2.27 GHz [1 core] 2 GB 133.08
Terremark 2gb AMD 8389 2.91 GHz [1 core] 2 GB 96.9
Rackspace 2gb AMD 2374 HE 2.20 GHz [4 cores] 2 GB 62.46
BlueLock 2gb Xeon X5550 2.67 GHz [1 core] 2 GB 49
Amazon EC2 c1.medium Xeon E5410 2.33 GHz [2 cores] 1.7 39.69


Programming Language Performance
Benchmark performance in this category was about 10-15% slower during the loaded benchmark run that what is shown below.

Large Server

Cloud Server CPU Memory Score
KT 32GB/6x2GHz Xeon L5640 2.27 GHz [6 cores] 32 GB 123.43
GoGrid 8gb Xeon E5450 2.99 GHz [6 cores] 8 GB 122.22
Amazon EC2 m2.2xlarge Xeon X5550 2.67 GHz [4 cores] 34.2 GB 115.45
BlueLock 16gb-8cpu Xeon X5550 2.67 GHz [8 cores] 16 GB 115.41
KT 16GB/3x2GHz Xeon L5640 2.27 GHz [3 cores] 16 GB 108.45
Terremark 16gb-8vpu AMD 8389 2.91 GHz [8 cores] 16 GB 106.9
Amazon EC2 m2.xlarge Xeon X5550 2.67 GHz [2 cores] 17.1 GB 102.27
Rackspace 16gb AMD 2374 2.20 GHz [4 cores] 16 GB 78.66

Programming Language Performance
Small Server

Cloud Server CPU Memory Score
BlueLock 2gb Xeon X5550 2.67 GHz [1 core] 2 GB 101.31
KT 2GB/1x2GHz Xeon L5640 2.27 GHz [1 core] 2 GB 95.72
Terremark 2gb AMD 8389 2.91 GHz [1 core] 2 GB 94.82
GoGrid 2gb Xeon E5520 2.27 GHz [2 cores] 2 GB 80.82
Rackspace 2gb AMD 2374 2.20 GHz [4 cores] 2 GB 73.71


Memory IO Performance
There was no notable different in memory IO benchmark performance between the loaded and unloaded runs.

Large Server

Cloud Server CPU Memory MIOP
BlueLock 16gb-8cpu Xeon X5550 2.67 GHz [8 cores] 16 GB 117.88
KT 32gb Xeon L5640 2.27 GHz [6 cores] 32 GB 114.48
Amazon EC2 m2.2xlarge Intel Xeon X5550 2.67 GHz [4 processors, 4 cores] 34.2 GB 113.04
KT 16gb Xeon L5640 2.27 GHz [3 cores] 16 GB 108.55
Amazon EC2 m2.xlarge Xeon X5550 2.67 GHz [2 cores] 17.1 GB 102.18
GoGrid 8gb Xeon E5450 2.99 GHz [6 cores] 8 GB 88.25
Rackspace 16gb AMD 2374 2.20 GHz [4 cores] 16 GB 70.09
Terremark 16gb-8vpu AMD 8389 2.91 GHz [8 cores] 16 GB 64.74


Memory IO Performance
Small Server

Cloud Server CPU Memory MIOP
BlueLock 2gb Xeon X5550 2.67 GHz [1 core] 2 GB 103.73
KT 2gb Xeon L5640 2.27 GHz [1 core] 2 GB 99.29
GoGrid 2gb Xeon E5520 2.27 GHz [2 cores] 2 GB 83.74
Terremark 2gb AMD 8389 2.91 GHz [1 core] 2 GB 66.06
Rackspace 2gb AMD 2374 2.20 GHz [4 cores] 2 GB 63.04


Encoding & Encryption Performance
Benchmark performance in this category was about 5-10% slower during the loaded benchmark run that what is shown below.

Large Server

Cloud Server CPU Memory Score
GoGrid 8gb Xeon E5450 2.99 GHz [6 cores] 8 GB 146.51
KT 16gb Xeon L5640 2.27 GHz [3 cores] 16 GB 139.25
KT 32gb Xeon L5640 2.27 GHz [6 cores] 32 GB 139.02
Amazon EC2 m2.2xlarge Xeon X5550 2.67 GHz [4 cores] 34.2 GB 136.32
Amazon EC2 m2.xlarge Xeon X5550 2.67 GHz [2 cores] 17.1 GB 135.81
BlueLock 16gb-8cpu Xeon X5550 2.67 GHz [8 cores] 16 GB 130.11
Rackspace 16gb AMD 2374 2.20 GHz [4 cores] 16 GB 111.2
Terremark 16gb-8vpu AMD 8389 2.91 GHz [8 cores] 16 GB 95.25


Encoding & Encryption Performance
Small Server

Cloud Server CPU Memory Score
KT 2gb Xeon L5640 2.27 GHz [1 core] 2 GB 137.21
Terremark 2gb AMD 8389 2.91 GHz [1 core] 2 GB 131.27
BlueLock 2gb Xeon X5550 2.67 GHz [1 core] 2 GB 119.57
Rackspace 2gb AMD 2374 2.20 GHz [4 cores] 2 GB 108.98
GoGrid 2gb Xeon E5520 2.27 GHz [2 cores] 2 GB 103.78
Amazon EC2 c1.medium Xeon E5410 2.33 GHz [2 cores] 1.7 GB 101.56

Conclusion
Overall the KT cloud performed very well relative to other public IaaS clouds. In particular, disk IO performance was exceptional considering Cloudscaling's use of external storage. By using external storage versus local storage, the KT cloud offers higher fault tolerance because VMs can be quickly or even automatically migrated to another host should the host they are running on fail. This feature is often referred to as high availability. Use of Intel Westmere L5640 processors also helped to provide very good CPU and memory IO performance. VM sizing also showed good linear performance increase from smaller to larger sized instances.