E1000, E1000E and VMXNET3 performance test

Posted on June 27, 2012 by admin

After reading some posts and blogs on vSphere5 and E1000E performance my curiosity was triggered to see if actually all these claims make sense and how vSphere actually behaves when testing.

Test setup

The setup I used is similar as described in http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf. The setup looks like:

Bare metal server (Client): B22-M3, 16GB, 2xE5-2450, 1280VIC
vSphere ESXi 5.0 server: B200-M3, 64GB, 2xE5-2680, 1280VIC
To accommodate the tests, the 1280VIC’s are connected to 2108 IOM’s and we are only using Fabric A / 6248-A.

The VM is configured in the following way (screenshot):

Local Area Connection: E1000
Local Area Connection 2: VMXNET3
Local Area Connection 3: E1000E
4GB Memory, 1 vCPU
Windows 2008R2

Test results

The following is a result of the best performance test I did run

Raw Data

Adapter Win Net Win CPU VM CPU VM Net FEX Net Graph

VMXNET3 9715 53% 82.57% 9493.67 9.92 link

E1000 9784 67% 118.89% 9491.87 9.99 link

E1000E 9654 66% 91.77% 9469.47 10.0 link

Column explanations:

Column description

Win Net Average transmission in Mbit/s on Windows

Win CPU Average CPU load on Windows measured

VM CPU %USED counter in esxtop

VM Net MbTX/s in esxtop

FEX Net Tx Bit Rate in Gbps as seen by the 2108 IOM module

Adapter	Win Net	Win CPU	VM CPU	VM Net	FEX Net	Graph
VMXNET3	9715	53%	82.57%	9493.67	9.92	link
E1000	9784	67%	118.89%	9491.87	9.99	link
E1000E	9654	66%	91.77%	9469.47	10.0	link

Column	description
Win Net	Average transmission in Mbit/s on Windows
Win CPU	Average CPU load on Windows measured
VM CPU	%USED counter in esxtop
VM Net	MbTX/s in esxtop
FEX Net	Tx Bit Rate in Gbps as seen by the 2108 IOM module

Data interpretation

We can clearly see that all adapters can be filled, full line speed. There are small differences but these could very much be due to sampling periods etc…

There is a higher CPU usage seen for E1000 and E1000E adapters, for both WIN CPU and VM CPU. I think however only for E1000 there is a high penalty where for E1000E this stays within acceptable limits.

Disclaimers

I’m not a bench guy neither is this my job, hence these figures are just my personal observation and by no means are they a result of a full professional benchmark. They are however fully reproducible.

The attached graphs do show some dips, I did not further look into them. I know technically why they are there, but did not look into fixing them.

VCDX Defense – Troubleshooting

Posted on June 18, 2012 by admin

If there was one part in the VCDX process that I couldn’t fail (or shouldn’t), it must have been the troubleshooting part. Being a troubleshooter already for around 7 years I do feel somehow I master the process of this. So if there is one place where I can provide some tips, it just has to be in regards to troubleshooting.

As the exam provides 15 minutes to perform a troubleshooting exercise, it is vital to have some upfront defined tactics on how you will attack the problem. The following is one that does work for me and maybe it could help you in preparing for your defense:

Define the problem:

The first thing you need todo is understand the problem. An example problem would be (I will work with a network analogy to not break any NDA): My network is down.
If you would dive straight into your troubleshooting now, you would miss some clear understanding of what is happening.

As described in numerous troubleshooting methodology’s, we need to clarify our problem. (The Power of Asking “Why?”). So if we would ask the question, “Why is your network down”, the answer could be: “Because our switch is not working properly”. This does change our problem to: “My switch is not working properly”. This automatically triggers the question: “Why is your switch not working properly”, which in turn would lead to the answer “Because it is dropping packets”. This would go on and asking to why would lead to the real problem: “Our c2960 switch has high CPU usage”.

The point here is, we start with a general statement, “My network is down” and finish with the actual problem: “My c2960 switch has high CPU usage”. Although the network down is a concern, it is only a consequence of the original problem.

Collect data around the problem:

What we often tend todo now once we have defined the problem, is jump straight into our technical knowledge and identify what is the cause. Although that might be OK if you have a 100% match and you know straight what it is (in our example maybe a log message indicating a clear bug), if often misleads us.

In this stage of the process of our troubleshooting we want to collect all the data we have:

What is the device/component having the problem
Is there a specific moment when that issue started happening
Is there a specific place we can see the problem
What is the scale/size/… of the problem

Interesting here is also to collect if there are similar devices/components not having such an issue. In our example case it could be your customer tells you he has a 2nd switch where he does not have such an issue. That would mean we can naturally start comparing them (which we will almost automatically do once we did define the existence of such a device/component). Not asking however will never reveal this vital information.

In regards to timing the customer could tell you the problem started when they upgraded the switch.

A specific place on a switch could be that he only sees the high CPU on the SNMP process in the switch. Or, not relevant to this case, for example, only on the first 8 ports. Again here you can start thinking, why the SNMP process and not the other processes ? What does the SNMP process do different compared to the other processes.

As a last one, the size/scale is important, having 100% CPU usage or 20% CPU usage does change the problem.

Look for a possible cause of the problem:

Now we have arrived @ the final, but most important stage, we first define the problem, collect data and only then we look for what could have caused our problem.

We can now find causes based on differences we have collected between working and non-working devices, based on the size of the problem and even based on the timing (changes).

Make sure to see if your cause makes sense when linking to your gathered data.

So how does this all add together for a VCDX defense?:

I think important during the defense process is to show a good use of troubleshooting process. Meaning, you need to speak the whole time, explain your reasoning and continuously keep on explaining.

During my defense I used the above process and made sure:

I used the Why? technique and kept on clarifying the initial problem.
I continuously showed to where I did narrow down the issue.
I explained what options I already eliminated.
I explained why some causes are not possible, based on collected data.

I also ended with a clear action plan, meaning, I told my defense panel, that is the problem, you need todo ‘this’ to confirm it and then ‘that’ to fix it.

Toronto VCDX attempt

Posted on March 9, 2012 by admin

Yes, I submitted again:

Date: Fri, 9 Mar 2012 16:57:14 +0100 (CET)
From: Ramses Smeyers <ramses@smeyers.be>
To: vcdx@vmware.com
Subject: VCDX application Ramses Smeyers – Toronto

A lot has changed on my design and a lot of more knowledge was gained. Some lessons learned with this attempt are:

The example document @ http://communities.vmware.com/thread/392902?tstart=0 gives a good view on how VMware looks @ designs, it will be included in the standard VCDX documentation
Twitter is very interesting, it gives a vast amount of interesting information of what is happening in the VMware world and how key persons think on certain topics, a good start-point is: https://twitter.com/#!/DuncanYB/vcdx
Find skilled people to review your document, with skilled I really mean skilled, not just some people knowing VMware, but people really knowing VMware, I have a long list of people to thank here, but I’m sure they know who I mean

Ramses Smeyers

Thoughts about life, clouds and universe

Author Archives: admin