FHRP and STP root outside of vPC domain

We implemented the topology in the following diagram and it worked fine for about a year. We heard nothing from the customer at least, so we assume everything was fine. The new infrastructure consisted of 2 Nexus 5548UP switches and 4 Nexus 2232 Fabric Extenders.
Physical connections from Cisco Catalyst switches to Nexus 5548

Physical connectivity among the Catalyst 4506, Nexus 5K and Nexus 2K

It was not until they deployed a VM (Virtual Machine) within one of the new VMware ESXi hosts connected to the new infrastructure using a couple of QLogic CNA cards, that we had an issue.

This VM had erratic connectivity to a Windows server running IBM’s TSM (Tivoli Storage Manager) located in the same Vlan. The TSM server was connected to Cat4506-01 via NIC teaming, aggregating 4 network adapters.

The ESXi hosting the new VM was connected to Fex103 in N5k-1 via its CNA-1 and directly to N5K-2 via its CNA-2. Both links were being aggregated using vPC from the Nexus 5000 point of view and a regular IP hash from the ESXi perspective.

The windows TSM server was able to talk to everything else in that same Vlan. However, the VM seemed to be able to talk to everything EXCEPT for that TSM server, but only from time to time (or so we were told).

We did some troubleshooting ourselves and found out that the VM only failed to coomunicate with the TSM server when the packets flowed through CNA-2. The test was simple:

  • When only CNA-1 was UP, communication always worked.
  • When only CNA-2 was UP, communication never worked.
  • When both CNAs were up, it worked for a bit, but then pings failed from the VM to the TSM server.

We decided then to open a ticket with Cisco TAC because, as usual, this was an urgent issue because it was holding back the deployment of more VMs and also because we thought that Cisco TAC would give us a quick and more precise insight into our problem.

The first thing they told us was that we should either connect both CNAs of the ESXi host to the N5Ks or both to N2Ks, but not a mix. We knew that the vPC supported topologies don’t allow connecting the 2 NICs of a server to the same N2K or to the same N5K if you are doing aggregation, but we were not aware that the way we connected the ESX servers was not supported. There’s actually no white paper or document provided by Cisco publicly that tells you not to connect it this way: NIC1 1 to a Nexus 5000 and NIC 2 to a Nexus 2000.

We went for it and changed the cabling accordingly: CNA-1 was now directly connected to N5K-1 and CNA-2 directly to N5K-2. We tested again. No good. Same exact issue.
We thought that maybe it could have something to do with the CNA cards or the ESXi host (installed on an IBM x series server) themselves. So we configured another physical server with another pair of QLogic CNA cards. We tested again. No good. We still experienced changed.

 

The next thing the TAC told us was that having the STP root bridge and the FHRP of this Vlan outside of the vPC domain (which consisted of the 2 Nexus 5548 UP devices), while connecting the Catalyst 4506 as shown in the diagram (non-vPC) was not within the “Best Practices” from Cisco.

We asked then if such a configuration was even supported then. They avoided giving a direct answer, kept repeating that it was not Cisco’s best practices and that we should consider modifying the topology.

Even when Cisco TAC never said that it was a design requirement for vPC to work in our scenario, they suggested connecting the 2 Catalyst 4506 in vPC mode to the Nexus 5000 switches. This was unfortunately not an option because the 4506s had no more available 10 GB ports and buying another 10 GB module was way too expensive. We needed another solution.

TAC then suggested moving only this VLAN’s L3 and STP root to the vPC domain. We considered that option and even prepared the configuration changes, but we decided to go over the configs again and test a little more, as changing the logical design of our customer’s network was never the goal of this project and it was something not to take on lightly.

Unfortunately, Nexus 5548 UP data center switches are not built with the ASICs needed to support ELAM captures. Such awesome capability available in Catalyst 6500 switches always comes in handy when troubleshooting weird hardware switching issues (when the packets are not CPU switched by the supervisor’s module). In Cat 6509 switches, this modular, robust and long-lasting devices, Elam captures would allow us to see the flow of packets that are being hardware switched, at every stage, if it’s going in at a specific interface and out at another.

 

Given this lack of troubleshooting tool, the TAC wanted us to get sniffer traces at both ends and along the path all the way from the Windows TSM server to the ESX host with our problematic VM.

I asked them if there wasn’t any other way we could proceed. Doing what TAC requested implied sending someone into a remote data center and freeze for some hours while we ran tests and captured traffic with god knows how many SPAN sessions. They said there was no other way…

Going back to our previous test results, we remembered that the pings from the VM to the Windows TSM server only failed when using the CNA connected to N5K-2. We decided to do a little bit more troubleshooting before sending some poor guy on site.
It was funny because when both CNAs were up, the first few pings would ALWAYS work every time the ARP table was erased from the server. Apparently, the first packets went out CNA-1, but after the 8th packet or so, the ARP entry was formed at the VM and the ESXi host decided to use CNA-2 to forward all upcoming packets.

 

We went again over the configurations of all the devices involved and found out that there were some old static MAC entries in the Nexus 5K configs. One of those entries belonged to the MAC address of the TSM server and it pointed to interface eth 1/8 in both Nexus 5548.

It turned out that, given the STP bridge priorities on this VLAN, the link between N5K-2 and Cat 4506-2 was in blocking STP state… This explained why the traffic ALWAYS failed when the ESXi host decided to use its CNA card connecting to N5K-2.

We cleared the static MAC entry at the N5Ks and problem solved. No need of wasting time, money and resources in sending someone to get endless sniffer traces.

Lessons learned:
  • Always triple check ALL your configurations before thinking you are facing a weird issue. Most of the times you will be facing a simple, yet sometimes hard to find human error.
  • Always have someone else, including Cisco TAC do it as well.
  • This topology works, even if it’s not within Cisco’s best practices.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>