If there was one part in the VCDX process that I couldn’t fail (or shouldn’t), it must have been the troubleshooting part. Being a troubleshooter already for around 7 years I do feel somehow I master the process of this. So if there is one place where I can provide some tips, it just has to be in regards to troubleshooting.
As the exam provides 15 minutes to perform a troubleshooting exercise, it is vital to have some upfront defined tactics on how you will attack the problem. The following is one that does work for me and maybe it could help you in preparing for your defense:
Define the problem:
The first thing you need todo is understand the problem. An example problem would be (I will work with a network analogy to not break any NDA): My network is down.
If you would dive straight into your troubleshooting now, you would miss some clear understanding of what is happening.
As described in numerous troubleshooting methodology’s, we need to clarify our problem. (The Power of Asking “Why?”). So if we would ask the question, “Why is your network down”, the answer could be: “Because our switch is not working properly”. This does change our problem to: “My switch is not working properly”. This automatically triggers the question: “Why is your switch not working properly”, which in turn would lead to the answer “Because it is dropping packets”. This would go on and asking to why would lead to the real problem: “Our c2960 switch has high CPU usage”.
The point here is, we start with a general statement, “My network is down” and finish with the actual problem: “My c2960 switch has high CPU usage”. Although the network down is a concern, it is only a consequence of the original problem.
Collect data around the problem:
What we often tend todo now once we have defined the problem, is jump straight into our technical knowledge and identify what is the cause. Although that might be OK if you have a 100% match and you know straight what it is (in our example maybe a log message indicating a clear bug), if often misleads us.
In this stage of the process of our troubleshooting we want to collect all the data we have:
- What is the device/component having the problem
- Is there a specific moment when that issue started happening
- Is there a specific place we can see the problem
- What is the scale/size/… of the problem
Interesting here is also to collect if there are similar devices/components not having such an issue. In our example case it could be your customer tells you he has a 2nd switch where he does not have such an issue. That would mean we can naturally start comparing them (which we will almost automatically do once we did define the existence of such a device/component). Not asking however will never reveal this vital information.
In regards to timing the customer could tell you the problem started when they upgraded the switch.
A specific place on a switch could be that he only sees the high CPU on the SNMP process in the switch. Or, not relevant to this case, for example, only on the first 8 ports. Again here you can start thinking, why the SNMP process and not the other processes ? What does the SNMP process do different compared to the other processes.
As a last one, the size/scale is important, having 100% CPU usage or 20% CPU usage does change the problem.
Look for a possible cause of the problem:
Now we have arrived @ the final, but most important stage, we first define the problem, collect data and only then we look for what could have caused our problem.
We can now find causes based on differences we have collected between working and non-working devices, based on the size of the problem and even based on the timing (changes).
Make sure to see if your cause makes sense when linking to your gathered data.
So how does this all add together for a VCDX defense?:
I think important during the defense process is to show a good use of troubleshooting process. Meaning, you need to speak the whole time, explain your reasoning and continuously keep on explaining.
During my defense I used the above process and made sure:
- I used the Why? technique and kept on clarifying the initial problem.
- I continuously showed to where I did narrow down the issue.
- I explained what options I already eliminated.
- I explained why some causes are not possible, based on collected data.
I also ended with a clear action plan, meaning, I told my defense panel, that is the problem, you need todo ‘this’ to confirm it and then ‘that’ to fix it.