Oscar的部落格publicmain复制地址

Power your innovation

Abstract

In our view, virtualization technology will become one of the most important technologies in business beyond doubt. However, the primary issues are in how to justify that virtualization technology deployment and if it is suitable for our platform.  Then, we need to determine whether our platform reaches its peak optimization. To solve these issues, we require corresponding benchmark test tools. Intel and IBM have provided a test tool called vConsolidate. TMG Lab is the first 3rd party Media lab in APAC that can carry on vConsolidate virtualization benchmark. 

Page 1Bring the “virtual” into reality - virtualization technology analysis

 

In server circles, virtualization brings higher utility to users at the component and system level.  This leads to a server environment of high reliability, a transparent workload balance, dynamic migration, error auto-isolation with system auto-reconstruction, and a more concise, centralized server resource distribution management mode.

 

As increasing demand from the enterprise to solve such problems as detailed cost control of IT system, larger amount servers, low server utility and bringing x 86 server reliability to the 5th digit, virtualization technology can offer a solution.  It promises to reduce the number of servers required, promote server resource benefits to a large scale, and the improved server reliability drives a strong need for server platforms based on x86 architecture. Especially as the x86-server becomes increasingly more crucial to industries and applications, this demand appears contines to become more urgent.

 

In fact, it has been several decades since virtualization first turned up. In 1959, Christopher Strachey published an academic paper named Time Sharing in Large Fast Computers, in which he initially mentioned the basic concept of virtualization. Although the author thought that the report was “mainly on multi-program technology (avoid external restrictions)". Between 1960 and 1979, IBM and some other companies developed virtualization tech, but same as Strachey, they focused on performance. If you are interested in this, please review the article The History of Virtualization on our website.

 

虚拟化

The following 10 years saw great improvement on micro-processor technology. IT gradually evolved to a "general" compute mode. At that time, most mainframes and mini infrastructures were replaced by the PC server in light of high efficiency and low cost, and in an ad hoc manner for new application deployment. Due to the ever increasing speeds and performance of microcomputers, initial deployment costs of new servers were affordable. IT managers began to use the improved performance to meet application needs by simply adding more servers. In fact, as long as certain department wanted to deploy a new application, it would also ask for a new server. And this application would typically be approved by IT managers. As a result, people rarely tried to use virtualization in x86 server environments. Essentially, because every group had their own server, they felt that it was unnecessary to change this process.

 

This directly led to a sharply increasing number of servers. From 1990 till now, this situation has been getting worse. On one hand, many servers in data centers are in full use, giving the average utilization of roughly 15%. On the other hand, maintenance cost of large data centers soared which took a lot of money to maintain electricity, space and cooling needs, and most expensive of all are the operation and management resources required to maintain the infrastructures.

 

Low efficiency and the ever-increasing costs impede enterprises’ ability to change to “a compute mode in direction of applicability”, which aims at constructing infrastructure in terms of a whole solution. While plan and deployment methods in the perspective of peak workload will be out of date. Many HW and SW manufacturers integrated corresponding technologies and functions in their products, including function, power, and virtualization management.

 

Virtualization Status Quo

 

In the late 1990s, VMware and other virtualization SW manufacturers initially built a virtualization road for x86 servers. They developed a virtualization monitor SW solution (VMM, alias Hypervisor) which achieved PC server platform virtualization. VMM/Hypervisor is a middle software level running between basic physical server and OS. It allows multi operations and applications to share hardware. However, in this kind of pure SW "full virtualization "mode, the VMM controls various key platform resources, then allocates to every client OS in order to avoid conflicts. It requires binary translation (a kind of complex operation on changing client OS binary), for the purpose of handling virtualization-relevant operation. A popular technology to improve virtualization performance, called "Paravirtualization” can modify client OS source code to allow interfaces for virtualization to be more efficient. But the problem is this modification is fairly complicated, calling for much time and human resource input from SW vendor, integrator, and IT manager to carry out system optimization. Furthermore, VMM cannot run unmodified or proprietary client OS.

虚拟化

VMM/Hypervisor is a middle software level running between basic physical server and virtual machine (VM) OS. VMM provides every VM OS with the hardware interface which makes the VM OS think that it controls the physical server.  This allows multiple VM OS’s to share the server hardware. The VMM simulates every physical server functions relative to the VM OS and the applications that run on that VM OS.

 

 

Page 2Open up Virtualization Era - Hardware Virtualization Technology 

 

Hardware virtualization technology aims at eliminate CPU’s need for binary translation, and simplify VMM implementation. This allows the VMM to not only support various unmodified or modified client OS, but also to improve system performance.

虚拟化

 

Under x86 architecture, wholly system-level “binary Translation”  technology is a pure SW simulation process. But this scheme brings the threat of instruction conflicts at the system level and causes some efficiency problems. IA-32 architecture adopts traditional ISA, which make sure the 4 OS and application can run "privilege levels" or "Ring"(Ring-0 to Ring-3). Normally, OS runs in Ring-0 with special privilege of accessing to all processors and platforms resource. Application runs normally in Ring-3 which restricts certain functions as memory map etc, to avoid affecting other applications. This configuration enables OS to maintain its control and authority to ensure the system continues to run smoothly.

 

In virtualization environment, due to VMM having to control platform resources, a normal solution is to run VMM in Ring-0 and lower client OS priority to Ring-1 or Ring-3. But at present, OS is designed for Ring-0. This brings lot of troubles to VMM. Because many OS instructions are specially designed for Ring-0, VM software has to modify OS source code otherwise it has to enable binary change which lead to intensive compute requirements.

 

Intel’s VT-x HW-Level Virtualization technology assists to solve these problems. Intel VT-x virtualization provides a new level called VMX Root, which is specially used to run VMM. This level optimize standard 4-Ring architecture, as a result, client OS can directly run in Ring-0, bringing basic HW sharing and privilege compress unnecessary. And Handoffs between the VMM and guest OSs are supported in hardware. This reduces the need for complex, compute-intensive software transitions. About the memory protection, Processor state information is retained for the VMM and for each guest OS in dedicated address spaces. This helps to accelerate transitions and ensure the integrity of the process. These enhancements will provide essential advantages, both for software vendors and IT organizations, including.

 

Future of HW Virtualization

 

 

Hardware virtualization Technology factually is implementing pure SW functions by hardware circuit. Intel has a full roadmap to develop its own virtualization technology. From VT-x, solving Ring-0 instruction conflict (2005 product enable) to VT-d, solving I/O device virtualization problem (Intel has achieved in Stoakley platform launched in 2007) and VT-x Gen2, solving memory virtualization problem (Intel will enable this tech in Nehalem product expecting launch in 2008).

 

 

New processors will be implementing, in Silicon, extensions to the paging mechanism to accelerate paging in a virtual environment.  In 1985 Intel introduced paging and Virtual Memory (VM) with launch of the Intel 80386. Since then, in a virtual environment, the page walking algorithm has been executed in part in the Silicon by the Virtual Memory unit and in part in SW by the Virtual Machine Monitor (VMM). The expectation is that on the new, upcoming processor VMMs will off-loading much of what the VMM has to do in software (to walk the page tables for each of the Virtual Machines (VM)) to extension of the paging unit implemented in Silicon. The vision is always to implement in Silicon ways to off-load and accelerate the SW.

 

Page 3Virtualization Performance Metric- vConsolidate

 

In our view, virtualization technology will become one of the most important technologies in business beyond doubt. However, the primary issues are in how to justify that virtualization technology deployment and if it is suitable for our platform.  Then, we need to determine whether our platform reaches its peak optimization. To solve these issues, we require corresponding benchmark test tools. Intel and IBM have provided a test tool called vConsolidate.  TMG Lab is the only 3rd party Media lab in APAC that can carry on vConsolidate virtualization benchmark. vConsolidate mode can effectively evaluate whether virtualization technology is suitable for client’s platform. Below is an introduction to it.

 

 

vConsolidate is a virtualization consolidation benchmark comprised of four separate benchmarks that run simultaneously. There is a benchmark component for database, web, java and mail.  As this is targeted for a virtual environment, each component runs in its own separate virtual machine (VM) with its own operating system. In addition to the four benchmark components there is a fifth virtual machine with no benchmark, which is meant to simulate an idle VM.  These five virtual machines comprise a consolidation stack unit (CSU).

 

As mentioned, the building block of a vConsolidate stack consists of five distinct virtual machines (database, web, mail, java, and idle). To run an individual consolidation stack unit (CSU), three client machines are connected to the server under test (SUT). The two of the clients drive the load to the web server and one of the clients drive the workload of the mail server.  The java and database components are self contained workloads and do not require external clients to run. Note, as more CSU’s are added additional clients are required in increments of three to support web and mail workload traffic generation.

 

The four component workloads used are database: Sysbench, web: Webbench, mail: LoadSim, java: Specjbb2005*.  Specific configuration of the individual workloads is given in the configuration section of this document.  The storage of the workload is distributed across the test configuration.  The web clients contain the WebBench client program.  The mail client contains: Microsoft Outlook, LoadSim, and the WebBench controller program.  The server contains the virtualization software, in this case VMware ESX server (version 3.0.2 build 52542).  The individual VM’s contain their respective workloads.  These VM files can be stored locally on the server or as in our configuration on an external SAN. Every CSU will produce a score for all 4 benchmarks (5th VM is idle). The final score from vConsilidate is a total of all CSU scores. We use that final score to compare against other system configurations to develop a suite of general reference scores.

 

Although standard workloads were used in the vConsolidate stack some modifications were required to match the requirements of a virtualized environment. The two main changes suggested in the vConsolidate test installation guide were to the Sysbench and Specjbb2005 workloads. 

 

Database

SYSBENCH-OLTP is an open source benchmark tool authored by MySQL to benchmark. On Line Transaction Processing (OLTP) performance for different database(s). Sysbench-OLTP is a multithreaded workload. Each thread sends transactions to DBMS. Sysbench-OLTP measures the number of Transactions Per Second (TPS) as its benchmark metric.

 

Java

Specjbb is a benchmark to measure Java performance developed by SPEC (http://www.spec.org/jbb2005/).  It measures the implementation of Java Virtual Machine (JVM), Just-In-Time compiler (JIT), garbage collection, threads and some aspects of the operating system. The Specjbb script increases the number of warehouses from 1 to 2*N, where N is number of CPUs.  The metric is Transactions/Sec (average of #warehouses from N to 2*N, where N is #CPUs).  The most recent release is SPECjbb2005, which provides a new enhanced workload implemented in a more object-oriented manner to reflect how real-world applications are designed.  It introduces new features such as XML processing and BigDecimal computations to make the benchmark a more realistic reflection of today's applications. 

 

Web

WebBench is a licensed PC Magazine benchmark program that measures the performance of Web servers. WebBench provides several standard workloads. The static workload files contain only static requests; the dynamic workload files contain both static and dynamic requests; and the e-commerce workload file contains secure and unsecure static and dynamic requests.  For our purposes we use the e-commerce test which uses a dynamic, SSL workload. 

 

 

Mail

LoadSim 2003 is a licensed Microsoft workload used to simulate the performance of MAPI clients. LoadSim creates the simulated mailbox load doing operations like send and receiving email.  It also does other tasks such as making and accepting calendaring appointments.  These more complex tasks are what separate this workload from simpler send-receive workloads.
 

Page 4Woodcrest vs. Clovertown: vConsolidate on VMWare* ESX

Configuration

Benchmark configuration is shown as below

Platform

Dawning I620R-F

Processor

2P Intel Xeon 5160

2P Intel Xeon X5365

Processor details

Woodcrest 3.0GHz/1333/4MB L2

Clovertown 3.0GHz/1333/2X4MB L2

Chipset

Intel®5000P chipset, FSB at 1333 MT/s

Memory

8x2GB

Memory details

Fully-Buffered DDR2 667 PC2-5300

BIOS settings:

Virtualization Technology: enabled

Hardware Prefetch: enabled (default)

Adjacent Cache Line Prefetch: disabled (default)

Fiber Channel HBA

QLE 2462, Embedded driver in Vmware ESX 3.0.2 (qla2300_7xx.o)

SAN HDD

Dawning 8310FF

Storage Configuration

RAID0

Table 1 Hardware Configuration

 

VMware* ESX Server 3.0.2 build 52542

 

No VM “pins” on a specific CPU manually

 

All VM image files are stored on SAN storage

Table 2 Software Configuration

Other configuration refers to Appendix A

Result

 

As to Woodcrest and Clovertown, both are configured and tested by workload of vConsolidation. We take Woodcrest 3.0GHz dual-core processor as reference, that is, take the result of 1 CSU configuration on Woodcrest 3.0GHz dual-core processor as reference because it generated the baseline, or lowest score. Other results are compared with reference score, higher is better. Benchmark for each configuration runs 3 times and result is average of the three. 

 

 

Figure 1 – Result of 2 Processors

 

Table 1 represents vConsolidate result of two processors run on 1 CSU. We can tell that CPU utility of the system based on Clovertown is not full.

 

#CSU

Woodcrest 3.0GHz/1333

Clovertown 3.0GHz/1333

1

1.00 @ 99%

1.65 @ 66%

Table 3 – Performance and CPU utility

 

Xeon quad-core system performance is 65% when deployed one CSU, which is better than dual-core. And quad core have more CPU resource, which means more CSU can be run to have a higher performance.
 

Appendix A-Other Configuration

Platform

Dawning I610R-F

Processor

2P Intel® Xeon 5160 or

2P Intel® Xeon 5365

Memory

2x2GB FBD

Table 4 – Client Configuration

 

VMs

vCPUs

vMemory

Web

2

1.5GB

Java

2

2.0GB

Database

2

1.5GB

Mail

1

1.5GB

Idle

1

0.4GB

Table 5 – System Configuration (Profile #2 in vConsolidate 1.0)

 

vConsolidate 1.0

Web Server virtual machine configuration

Guest OS

Microsoft* Windows* Server 2003 Enterprise R2 32-bit Edition

Web Server

Microsoft* Internet Information Service 6.0, 32-bit

Workload

WebBench 5.0 test kit

Workload parameters

Ecommerce workload, default parameters provided with webbench

Metric: Throughput (bigger is better)

Java virtual machine configuration

Guest OS

Microsoft* Windows* Server 2003 Enterprise R2 x64 Edition

Workload

Modified SPECjbb2005 v1.07

JVM version

jrockit-jdk1.5.0_10-windows_x86_64

JVM Parameters

-options-Xms(Heapsize)m –Xmx(Heapsize)m -XXaggressive -XXthroughputCompaction -XXallocPrefetch -XXallocRedoPrefetch -XXcompressedRefs -XXtlasize64k –XXlazyUnlocking

JVM Heap Size

-Xms1024M         

-Xmx1024M                (Memory = 2048M)

Database Server virtual machine configuration

Guest OS

Microsoft* Windows* Server 2003 Enterprise R2 x64 Edition

Workload

SysBench 0.4.0 windows with SQL Server version 1.0

Workload parameters

Number of threads: 4

Database

Microsoft* SQL Server* 2005 Enterprise Edition

Metric: