Resilience
Disclaimer: I conducted all tests using GNS3 version 2.2.46 in a simulated environment. The switch image used is Vios_l2-ADVENTERPRISEK9-M - 15.2(20200924:215240). The router image is VIOS-ADVENTERPRISEK9-M - 15.9(3)M6. The purpose of this was solely educational.
I simulated various issues between an office PC and an internet server to show how the network behaves during incidents. When the PC powers on, it scans for an available DHCP server by sending a discovery message. Switch B-00-SW01, acting as a DHCP relay, forwards requests to the server through one of the core switches.
The core switches A-00-SW01 and A-00-SW02 learn a summarized route to the data center and Building B via OSPF. So, both switches learn the same networks through the same protocol and share the same metric to the destination. Consequently, the device performs load balancing between both paths using Equal-Cost Multipath (ECMP). Cisco Express Forwarding (CEF) handles the path selection. It uses pre-destination sharing by default. It forwards a source-destination packet flow through the same interface. The table below shows this process and the path CEF selects for the admin PC to reach the DHCP server.
A-00-SW01#show ip route 1.1.2.1
Routing entry for 1.1.2.0/24
Known via "ospf 109", distance 110, metric 21, type inter area
Last update from 192.168.0.38 on GigabitEthernet0/2, 00:03:38 ago
Routing Descriptor Blocks:
* 192.168.0.46, from 1.1.1.8, 00:46:04 ago, via GigabitEthernet0/3
Route metric is 21, traffic share count is 1
192.168.0.38, from 1.1.1.7, 00:03:38 ago, via GigabitEthernet0/2
Route metric is 21, traffic share count is 1
A-00-SW01#show ip cef 1.1.2.1 detail
1.1.2.0/24, epoch 0, per-destination sharing
nexthop 192.168.0.38 GigabitEthernet0/2
nexthop 192.168.0.46 GigabitEthernet0/3
A-00-SW01#show ip cef exact-route 10.0.100.20 1.1.2.1
10.0.100.20 -> 1.1.2.1 =>IP adj out of GigabitEthernet0/2, addr 192.168.0.38
A-00-SW01#show cef state
—----- texto omitido —--------
IPv4 CEF Status:
CEF enabled/running
universal per-destination load sharing algorithm, id F81E2C26
Polarization means the use of the same path for the same source-destination pair. To avoid this, each device gets a unique ID. This ID ensures different algorithm results. They lead to varied paths for the same source-destination pair. This principle applies only to layer tree devices: routers, NATs, VPNs, servers, and distribution switches.
Scenario 1 - Data center. Suppose a failure occurs on the link connecting switch A-03-SV01 to the DHCP server, as shown in the image below. The data center is in a different OSPF area from the core. So, issues in the block do not affect other areas. Therefore, the core switches continue to see a summarized route to the services. A-03-SV01 is the only device affected due to its direct link to the DHCP server. After convergence, OSPF adds a new entry in the routing table to its destination via PortChannel Po1. This portchannel must be L3; otherwise, a blackhole could occur. This means the core switches see a route to the DHCP server through switch A-03-SV01. However, SV01 lacks a route to the server in its routing table. Alternatively, if communication had been through A-03-SV02, it would have proceeded without issue. The table below shows the initial route to the destination and the replacement after the failure.
A-03-SV01#show ip route 1.1.2.1
Routing entry for 1.1.2.1/32
Known via "ospf 109", distance 110, metric 11, type intra area
172.22.32.5, from 1.1.2.1, 00:41:30 ago, via GigabitEthernet3/0
Route metric is 11, traffic share count is 1
A-03-SV01#
*Apr 17 16:43:40.624: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:1 handle:1, is going Down Reason: ECHO FAILURE
*Apr 17 16:43:40.634: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed, ld:1 neigh proc:OSPF, handle:1 act
*Apr 17 16:43:40.635: %OSPF-5-ADJCHG: Process 109, Nbr 1.1.2.1 on GigabitEthernet3/0 from FULL to DOWN, Neighbor Down: BFD node down
*Apr 17 16:43:41.788: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet3/0, changed state to down
*Apr 17 16:43:42.683: %LINK-3-UPDOWN: Interface GigabitEthernet3/0, changed state to down
A-03-SV01#show ip route 1.1.2.1
Routing entry for 1.1.2.1/32
Routing Descriptor Blocks:
* 172.22.32.2, from 1.1.2.1, 00:00:09 ago, via Port-channel1
Route metric is 16, traffic share count is 1
A-03-SV02#show ip route 1.1.2.1
Routing entry for 1.1.2.1/32
Routing Descriptor Blocks:
* 172.22.32.9, from 1.1.2.1, 00:41:29 ago, via GigabitEthernet3/0
Route metric is 11, traffic share count is 1
If you want to see all fault tests in the data center and how the network responds to them, click here. Then, go to the section NETWORKING - Building A - Data Center. The image below shows two data flows destined for the DHCP server.
Keep going with the example, the PC now has an IP address and is ready to access the internet. To connect to the server, it sends traffic to its default gateway, 10.0.100.1. In this design, the default gateway is B-00-SW01, which also is the VRRP master. It has a virtual interface with the mentioned IP and its virtual MAC. The table below illustrates this setup.
B-00-SW01#show vrrp
Vlan100 - Group 100
State is Master
Virtual IP address is 10.0.100.1
Virtual MAC address is 0000.5e00.0164
B-01-SW01#sho mac address-table dynamic interface g0/0
Mac Address Table
-------------------------------------------
Vlan Mac Address Type Ports
---- ----------- -------- -----
100 0000.5e00.0164 DYNAMIC Gi0/0
1000c94.7ab2.0002DYNAMIC Gi0/0
vlan100-ADM> show arp
00:00:5e:00:01:64 10.0.100.1 expires in 112 seconds
All-access switches have redundant links to the distribution layer. Therefore, if one link fails, RSTP enables an alternative path. Note that I used Cisco's rapid-pvst protocol, which means using RSTP per VLAN
Scenario 2 - Building B: Suppose the distribution switch B-00-SW01 fails and stops working. The first eight user networks will experience traffic disruptions, and there will be a brief downtime. It will last until RSTP converges to make a new path for the traffic, and VRRP switches from Backup to Master on switch B-00-SW02.
B-01-SW01#
*Apr 18 10:53:19.023: %LINK-5-CHANGED: Line protocol on Interface GigabitEthernet0/0, changed state to down
*Apr 18 10:53:20.024: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0, changed state to down
B-01-SW01#show spanning-tree inter g0/0
no spanning tree info available for GigabitEthernet0/0
B-01-SW01#show spanning-tree inter g0/1
Vlan Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- --------
VLAN0100 Root FWD 4 128.2 P2p
VLAN0101 Root FWD 4 128.2 P2p
VLAN0102 Root FWD 4 128.2 P2p
VLAN0103 Root FWD 4 128.2 P2p
VLAN0104 Root FWD 4 128.2 P2p
VLAN0105 Root FWD 4 128.2 P2p
VLAN0106 Root FWD 4 128.2 P2p
VLAN0107 Root FWD 4 128.2 P2p
VLAN0108 Root FWD 4 128.2 P2p
VLAN0109 Root FWD 4 128.2 P2p
VLAN0110 Root FWD 4 128.2 P2p
VLAN0111 Root FWD 4 128.2 P2p
VLAN0112 Root FWD 4 128.2 P2p
VLAN0113 Root FWD 4 128.2 P2p
VLAN0114 Root FWD 4 128.2 P2p
VLAN0115 Root FWD 4 128.2 P2p
VLAN0116 Root FWD 4 128.2 P2p
The upper image shows when B-00-SW01 fails, B-01-SW01 loses connectivity through its G0/0 interface. After that, all networks managed by B-00-SW01 change their STP relationships to the new root bridge, B-00-SW02. In the image below, the logs show when B-00-SW01 fails, B-00-SW02 becomes the Master of all networks.
*Apr 18 10:53:06.936: %VRRP-6-STATECHANGE: Vl105 Grp 105 state Backup -> Master
*Apr 18 10:53:06.878: %VRRP-6-STATECHANGE: Vl101 Grp 101 state Backup -> Master
*Apr 18 10:53:06.990: %VRRP-6-STATECHANGE: Vl102 Grp 102 state Backup -> Master
*Apr 18 10:53:07.016: %VRRP-6-STATECHANGE: Vl108 Grp 108 state Backup -> Master
*Apr 18 10:53:07.056: %VRRP-6-STATECHANGE: Vl104 Grp 104 state Backup -> Master
*Apr 18 10:53:07.208: %VRRP-6-STATECHANGE: Vl106 Grp 106 state Backup -> Master
*Apr 18 10:53:07.349: %VRRP-6-STATECHANGE: Vl100 Grp 100 state Backup -> Master
*Apr 18 10:53:07.445: %VRRP-6-STATECHANGE: Vl107 Grp 107 state Backup -> Master
*Apr 18 10:53:07.509: %VRRP-6-STATECHANGE: Vl103 Grp 103 state Backup -> Master
B-00-SW02#show vrrp brief
Interface Grp Pri Time Own Pre-State Master addr Group addr
Vl100 100 100 3609 Y Master 10.0.100.253 10.0.100.1
Vl101 101 100 3609 Y Master 10.0.101.253 10.0.101.1
Vl102 102 100 3609 Y Master 10.0.102.253 10.0.102.1
Vl103 103 100 3609 Y Master 10.0.103.253 10.0.103.1
Vl104 104 100 3609 Y Master 10.0.104.253 10.0.104.1
Vl105 105 100 3609 Y Master 10.0.105.253 10.0.105.1
Vl106 106 100 3609 Y Master 10.0.106.253 10.0.106.1
Vl107 107 100 3609 Y Master 10.0.107.253 10.0.107.1
Vl108 108 100 3609 Y Master 10.0.108.253 10.0.108.1
Vl109 109 150 3414 Y Master 10.0.109.253 10.0.109.1
Vl110 110 150 3414 Y Master 10.0.110.253 10.0.110.1
Vl111 111 150 3414 Y Master 10.0.111.253 10.0.111.1
Vl112 112 150 3414 Y Master 10.0.112.253 10.0.112.1
Vl113 113 150 3414 Y Master 10.0.113.253 10.0.113.1
Vl114 114 150 3414 Y Master 10.0.114.253 10.0.114.1
Vl115 115 150 3414 Y Master 10.0.115.253 10.0.115.1
Vl116 116 150 3414 Y Master 10.0.116.253 10.0.116.1
After this, B-00-SW02 will send the traffic from the first eight user networks to NAT1 through tunnel 1, thanks to a routing policy. Meanwhile, the other eight networks will go to NAT2 through tunnel 2, thanks to a default route. Both distribution switches have a routing policy that sends traffic to their respective NAT routers if one of them fails. If B-00-SW01 has a problem and NAT1 does too, B-00-SW02 will send traffic to NAT2, and vice versa. This next image shows the path to the tunnels.
vlan100-ADM> trace 172.16.100.1
trace to 172.16.100.1, 8 hops max, press Ctrl+C to stop
1 10.0.100.253 54.190 ms 19.268 ms 38.097 ms -> B-00-SW02 (VLAN100)
2 192.168.0.74 32.571 ms 29.612 ms 40.322 ms -> A-00-NAT01 (Tu2)
3 192.168.0.61 95.986 ms 62.079 ms 42.644 ms -> A-00-SW01 (G1/3)
4 192.168.0.1 43.666 ms 42.091 ms 54.275 ms -> A-00-RT01 (G0/0)
5 192.168.255.2 54.545 ms -> ISP10
vlan116-AAL> tracer 172.16.100.1
trace to 172.16.100.1, 8 hops max, press Ctrl+C to stop
1 10.0.116.253 19.649 ms 24.897 ms 37.691 ms -> B-00-SW02 (VLAN116)
2 192.168.0.90 28.649 ms 29.373 ms 27.539 ms -> A-00-NAT02 (Tu2)
3 192.168.0.77 46.313 ms 39.938 ms 43.738 ms -> A-00-SW01 (G2/0
4 192.168.0.13 53.058 ms 59.646 ms 65.481 ms -> A-00-RT02 (G0/1)
5 192.168.254.2 64.714 ms -> ISP10
To see all fault tests in Building B and how the network responds to them, click here! Then, scan the NETWORKING - Building B section.
Continuing with the path to the internet, the GRE protocol will encapsulate the traffic into a new IP header. Letting the traffic travel as if it were in a tunnel. The following image shows the new header and endpoint devices' IP addresses. Switch B-00-SW02 is 1.1.1.6, while router NAT2 is 1.1.1.16.
A-00-NAT01#show ip nat translations
Pro Inside global Inside local Outside local Outside global
udp 192.168.139.3:33729 10.0.100.20:33729 172.16.100.1:33730 172.16.100.1:33730
A-00-NAT02#show ip nat translations
Pro Inside global Inside local Outside local Outside global
udp 192.168.139.67:1393 10.0.116.20:1393 172.16.100.1:1394 172.16.100.1:1394
Note: It is best to avoid packet fragmentation at any time possible. However, creating tunnels can cause it. I adjusted the MSS for TCP and Path-MTU-discovery for UDP to mitigate this. Tunnels are currently needed to reach the NAT routers because I have not found a solution yet that satisfies redundancy.
The tunnel configuration uses IPs on loopback interfaces. That lets the core switches send packets, even if one of the NAT router links has an issue. Since these links are redundant, this leads us to a new scenario
Scenario 3 - Building A (Core): Suppose there is a failure in the link from switch A-00-SW02 to NAT2. After OSPF convergence, the switch finds a new path to reach NAT2. It goes through switch SW01 via Port Channel Po1. The image shows what happens after the issue with the G2/0 interface, and then, the route to NAT2 changes through Port-Channel1.
A-00-SW02#show ip route 1.1.1.16
Routing entry for 1.1.1.16/32
Known via "ospf 109", distance 110, metric 11, type intra area
Last update from 192.168.0.82 on GigabitEthernet2/0, 00:02:33 ago
Routing Descriptor Blocks:
* 192.168.0.82, from 192.168.139.65, 00:02:33 ago, via GigabitEthernet2/0
Route metric is 11, traffic share count is 1
*Apr 19 07:07:33.159: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/0, changed state to down
*Apr 19 07:07:34.154: %LINK-3-UPDOWN: Interface GigabitEthernet2/0, changed state to down
A-00-SW02#show ip route 1.1.1.16
Routing entry for 1.1.1.16/32
Known via "ospf 109", distance 110, metric 14, type intra area
Last update from 192.168.0.33 on Port-channel1, 00:04:10 ago
Routing Descriptor Blocks:
* 192.168.0.33, from 192.168.139.65, 00:04:10 ago, via Port-channel1
Route metric is 14, traffic share count is 1
Scenario 4 - Building A (Core): Suppose switch A-00-SW02 has an issue and stops working. Users in Building B will send their traffic to switch B-00-SW02 because B-00-SW01 still has problems (Scenario 2). After that, switch B-00-SW02 will form the respective tunnels to the NAT routers. The traffic will travel through switch A-00-SW01 and reach NAT1 and NAT2. Then, it will return to switch SW01 to be routed to the internet.
*Apr 19 13:01:21.571: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0, changed state to down
*Apr 19 13:01:22.580: %LINK-3-UPDOWN: Interface GigabitEthernet1/0, changed state to down
B-00-SW02#show ip inter bri | e una
Interface IP-Address OK? Method Status Protocol
GigabitEthernet1/0 192.168.0.26 YES NVRAM down down
GigabitEthernet1/1 192.168.0.22 YES NVRAM up up
Loopback0 1.1.1.6 YES NVRAM up up
Tunnel1 192.168.0.73 YES NVRAM up up
Tunnel2 192.168.0.89 YES NVRAM up up
—----- text omitted —--------
B-00-SW02#sho ip cef 1.1.1.15
1.1.1.15/32
nexthop 192.168.0.21 GigabitEthernet1/1
B-00-SW02#sho ip cef 1.1.1.16
1.1.1.16/32
nexthop 192.168.0.21 GigabitEthernet1/1
A-00-SW01#sho ip cef 1.1.1.15
1.1.1.15/32
nexthop 192.168.0.62 GigabitEthernet1/3
A-00-SW01#sho ip cef 1.1.1.16
1.1.1.16/32
nexthop 192.168.0.78 GigabitEthernet2/0
A-00-NAT01#show ip nat translations
Pro Inside global Inside local Outside local Outside global
icmp 192.168.139.3:43644 10.0.100.20:43644 172.16.100.1:43644 172.16.100.1:43644
A-00-NAT02#show ip nat translations
Pro Inside global Inside local Outside local Outside global
icmp 192.168.139.67:48508 10.0.116.20:48508 172.16.100.1:48508 172.16.100.1:48508
The image shows the sequence of events mentioned above. At first, from the view of B-00-SW02, the G1/0 interface linking it to A-00-SW02 fails. Then, the next hop to the NAT routers which is only through switch A-00-SW01. Finally, from the view of the core switch A-00-SW01, the next hop to the NATs. Then, the NATs translate private IPs to public ones.
Once each NAT router translates the private IP to a public one, it will send the packets to the internet. As mentioned in the section's overview, the core switches have a routing policy that tries to keep traffic symmetrical. NAT2 traffic goes to the internet through router RT2, no matter if it reaches SW01 or SW02 before. Similarly, router RT1 receives traffic from NAT1. The image shows the routing policy on switch A-00-SW01 and a trace from the admin PC to the internet.
A-00-SW01#show access-lists net13964-to-rt02
Extended IP access list net13964-to-rt02
10 permit ip 192.168.139.64 0.0.0.31 any (9 matches)
A-00-SW01#show route-map nat02-to-rt02
route-map nat02-to-rt02, permit, sequence 10
Match clauses:
ip address (access-lists): net13964-to-rt02
Set clauses:
ip next-hop 192.168.0.13 192.168.0.34
Policy routing matches: 9 packets, 954 bytes
vlan116-AAL> tracer 172.16.100.1
trace to 172.16.100.1, 8 hops max, press Ctrl+C to stop
1 10.0.116.253 31.674 ms 33.395 ms 20.900 ms -> B-00-SW02 (Vlan116)
2 192.168.0.90 37.280 ms 41.506 ms 38.027 ms -> A-00-NAT02 (Tu2)
3 192.168.0.77 48.460 ms 87.700 ms 54.934 ms -> A-00-SW01 (G2/0)
4 192.168.0.13 45.858 ms 44.470 ms 51.764 ms -> A-00-RT02 (G0/1)
5 *192.168.254.2 43.102 ms -> ISP10
Scenario 5 - Building A (Core): For this example, router A-00-RT01 experiences an issue. Consequently, router A-00-RT02 will handle all internet communications. As shown in the previous image, the policy on core switch A-00-SW01 only affects public IP traffic from NAT2. Traffic from NAT1 uses the default route. It is important to note that this situation also applies to the switch SW02 but in the opposite direction.
Now, A-00-SW01 has a floating static route configured in case of an issue with the default route. If this happens, the routing table will use the backup route that goes to router A-00-RT02. The image below shows the static route configuration. It also demonstrates how the router added the floating route to the routing table when RT01 had an issue.
A-00-SW01#show run | section ip route
ip route 0.0.0.0 0.0.0.0 1.1.1.1
ip route 0.0.0.0 0.0.0.0 1.1.1.2 5
ip route 10.2.100.0 255.255.255.0 192.168.0.53
A-00-SW01#show ip route static
—-- Se han omitido los Codes —------
Gateway of last resort is 1.1.1.2 to network 0.0.0.0
S* 0.0.0.0/0 [5/0] via 1.1.1.2
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
S 10.2.100.0/24 [1/0] via 192.168.0.53
Therefore, traffic from NAT1, i.e., the first eight VLANs, will reach core switch A-00-SW01 as expected. But it has no connection to the internet router A-00-RT01. It will send the traffic via its floating static route to router RT2 and then to the internet.
vlan100-ADM> tracer 172.16.100.1
trace to 172.16.100.1, 8 hops max, press Ctrl+C to stop
1 10.0.100.253 53.306 ms 26.391 ms 24.632 ms -> B-00-SW02 (Vlan116)
2 192.168.0.74 32.571 ms 29.612 ms 40.322 ms -> A-00-NAT01 (Tu2)
3 192.168.0.61 53.695 ms 44.801 ms 41.136 ms -> A-00-SW01 (G1/3)
4 192.168.0.13 46.678 ms 49.756 ms 39.173 ms -> A-00-RT02 (G0/1)
5 *192.168.254.2 37.886 ms (ICMP type:3, code:3, Destination port unreachable)
An essential point of the design is that the NAT routers have assigned the public IPs, not the internet routers. Routers A-00-RT01 and RT2 learn them via OSPF and then advertise them via BGP to the ISPs. In this scenario, RT01 will stop advertising public IPs, and ISP10's router will recalculate how to reach the network 192.168.139.0/27. It will do so through RT2. Suppose the issue occurs in a NAT router like NAT1. It will stop advertising the network via OSPF. Then, the internet routers will stop learning and advertising it via BGP to the internet. The following image shows the ISP10 logs when router RT01 fails and the new route to the destination. It also indicates ISP20 and the routes to the destination.
ISP01#
*Apr 19 15:05:48.256: %BGP-3-NOTIFICATION: sent to neighbor 192.168.255.1 4/0 (hold time expired) 0 bytes
*Apr 19 15:05:48.262: %BGP-5-NBR_RESET: Neighbor 192.168.255.1 reset (BGP Notification sent)
*Apr 19 15:05:48.282: %BGP-5-ADJCHANGE: neighbor 192.168.255.1 Down BGP Notification sent
*Apr 19 15:05:48.283: %BGP_SESSION-5-ADJCHANGE: neighbor 192.168.255.1 IPv4 Unicast topology base removed from session BGP Notification sent
ISP01#show ip route bgp
—-- Se han omitido los Codes —------
Gateway of last resort is not set
192.168.139.0/27 is subnetted, 2 subnets
B 192.168.139.0 [20/21] via 192.168.254.1, 00:39:01 -> A-00-RT02 (G0/2)
B 192.168.139.64 [20/21] via 192.168.254.1, 01:57:57 -> A-00-RT02 (G0/2)
AS-20#show ip route bgp
—-- Se han omitido los Codes —------
Gateway of last resort is not set
192.168.139.0/27 is subnetted, 2 subnets
B 192.168.139.0 [20/21] via 192.168.252.1, 00:04:14 -> A-00-RT02 (G0/3)
B 192.168.139.64 [20/21] via 192.168.252.1, 00:04:14 -> A-00-RT02 (G0/3)
Scenario 6 - Internet: Router A-00-RT01 is not functioning, and one of our providers, ISP20, has also failed. Thanks to the dual multi-homed connection, the only alternative is through ISP10. It is via internet router A-00-RT02. The following image shows the notifications on the ISP20 issue and the new routing table of RT2.
A-00-RT02#
*Apr 19 15:42:08.495: %BGP-3-NOTIFICATION: sent to neighbor 192.168.252.2 4/0 (hold time expired) 0 bytes
*Apr 19 15:42:08.501: %BGP-5-NBR_RESET: Neighbor 192.168.252.2 reset (BGP Notification sent)
*Apr 19 15:42:08.511: %BGP-5-ADJCHANGE: neighbor 192.168.252.2 Down BGP Notification sent
*Apr 19 15:42:08.512: %BGP_SESSION-5-ADJCHANGE: neighbor 192.168.252.2 IPv4 Unicast topology base removed from session BGP Notification sent
A-00-RT02#show running | s ip route
ip route 0.0.0.0 0.0.0.0 192.168.252.2
ip route 0.0.0.0 0.0.0.0 192.168.254.2 5
A-00-RT02#show ip route bgp
—-- Se han omitido los Codes —------
Gateway of last resort is 192.168.254.2 to network 0.0.0.0
172.16.0.0/24 is subnetted, 2 subnets
B 172.16.98.0 [20/0] via 192.168.254.2, 00:48:06
B 172.16.100.0 [20/0] via 192.168.254.2, 04:38:31