This is a bit of a resource for troubleshooting OSPF but breaking various aspects and see just how it fails.
To begin with, we have a simple OSPF setup with two routers working normally.
So that’s the setup, pretty simple. Just to confirm it’s working and show you the IP addresses:
Loki#sh ip ospf int brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 192.168.0.2/32 1 LOOP 0/0
Fa0/0 1 0 10.0.0.2/30 1 DR 1/1
Loki#show ip ospf neigh
Neighbor ID Pri State Dead Time Address Interface
192.168.0.1 1 FULL/BDR 00:00:34 10.0.0.1 FastEthernet0/0
Thor#show ip ospf int brief Interface PID Area IP Address/Mask Cost State Nbrs F/C Lo0 1 0 192.168.0.1/32 1 LOOP 0/0 Fa0/0 1 0 10.0.0.1/30 1 BDR 1/1 Thor#show ip ospf neighb Neighbor ID Pri State Dead Time Address Interface 192.168.0.2 1 FULL/DR 00:00:36 10.0.0.2 FastEthernet0/0
Simple enough.
Case 1: Wrong Subnet Mask
For our first issue, we will change the subnet mask on Thor to be a /29 (instead of the /30).
Thor(config)#int f0/0 Thor(config-if)#ip add 10.0.0.1 255.255.255.248 Thor(config-if)# *Mar 1 00:14:59.331: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.0.2 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Interface down or detached
Straight away you see the neighbor go down. So how could you fault find this? Well, to begin with, you would probably do a show ip ospf int brief to give you some idea what’s going on:
Thor#show ip ospf int brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 192.168.0.1/32 1 LOOP 0/0
Fa0/0 1 0 10.0.0.1/29 1 DR 0/0
You can see that Thor thinks it’s the DR and there are no neighbors, so that’s not particularly helpful. If you tried to ping the neighbor address, you would see that you still have connectivity (so it’s not just that the neighbor is shut down or unplugged):
Thor#ping 10.0.0.2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 16/22/32 ms
Alright, so let’s dig into some debugs. I generally try to keep the debug commands as specific as possible. For example, in this case I will use debug ip ospf hello rather then debug ip ospf packet. This is important particularly in operational networks as some debugs (let’s say, debug ip packet will absolutely hammer to crap out of your CPU).
Thor# debug ip ospf hello OSPF hello events debugging is on Thor# *Mar 1 00:21:39.095: OSPF: Rcv hello from 192.168.0.2 area 0 from FastEthernet0/0 10.0.0.2 *Mar 1 00:21:39.095: OSPF: Mismatched hello parameters from 10.0.0.2 *Mar 1 00:21:39.095: OSPF: Dead R 40 C 40, Hello R 10 C 10 Mask R 255.255.255.252 C 255.255.255.248 *Mar 1 00:21:39.339: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.1
Once you get the next hello you can straight away see what’s up. First it tells you there are mismatched hello parameters – then it explicitly states the mask is wrong – “Mask R 255.255.255.252 C 255.255.255.248”
That was pretty easy. It is a similar process for mismatched dead and hello timers.
Case 2: Different Areas
This case is pretty easy to find too. After fixing up the subnet from the previous case, I changed the 10.0.0.1 interface to be in area 1. Again, neighborship goes down straight away. Show ip ospf niegh and show ip ospf int brief does not tell you much. In this case, you do not even need any debug statements:
Thor# *Mar 1 00:28:59.087: %OSPF-4-ERRRCV: Received invalid packet: mismatch area ID, from backbone area must be virtual-link but not found from 10.0.0.2, FastEthernet0/0 Thor#show debug
As you can see, no debugs are on. You can see the neighbor believes it is in the backbone area (area 0) and the local router does not.
Now, the case is a little different on the other end. Loki – which still is in area 0. Loki is not generating any output. Even turning on debug ip ospf hello or debug ip ospf packet tells you nothing (except that you are sending packets out)
Loki#debug ip ospf packet OSPF packet debugging is on Loki#debug ip ospf hello OSPF hello events debugging is on Loki# *Mar 1 00:34:58.063: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.2
In this case, the trick is to debug ip ospf events.
Loki#debug ip ospf events OSPF events debugging is on Loki# *Mar 1 00:35:41.943: OSPF: Rcv pkt from 10.0.0.1, FastEthernet0/0, area 0.0.0.0 mismatch area 0.0.0.1 in the header
And there it is. Mismatch area.
Case 3: MTU Mismatch
This is a slightly more interesting case. Firstly, when you change the interface MTU (in this case I changed it on Thor’s Fa0/0 to ip mtu 80) OSPF does not actually fail. So it is entirely possible you might make a change, not see OSPF fail, then leave it alone and everything will appear to be working fine. The failure won’t be seen until later (say, after a reset..these things being as they are, at 2am on Sunday morning).
To force the issue, I did a shut-no shut on Fa0/0 on Thor.
Once you start debugging you will see that the neighbor might not be down – it might be in EXSTART.
Thor#show ip ospf neigh
Neighbor ID Pri State Dead Time Address Interface
192.168.0.2 1 EXSTART/DR 00:00:38 10.0.0.2 FastEthernet0/0
Eventually, you will get a message saying OSPF has given up on life:
Thor# *Mar 1 00:54:02.459: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.0.2 on FastEthernet0/0 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
If you do a show ip ospf neighbor between that and the next hello message the neighbor will appear as down (rather than EXSTART).
The fact that OSPF does begin to establish indicates that all the parameters (hello timers, subnet mask, area etc) match and are fine. OSPF never getting any further than EXSTART is a huge hint that you should look at the interface MTUs. You can also debug ip ospf events to confirm:
Thor#debug ip ospf events OSPF events debugging is on Thor# *Mar 1 00:59:47.459: OSPF: Send DBD to 192.168.0.2 on FastEthernet0/0 seq 0x13F2 opt 0x52 flag 0x7 len 32 *Mar 1 00:59:47.459: OSPF: Retransmitting DBD to 192.168.0.2 on FastEthernet0/0 [17] *Mar 1 00:59:47.463: OSPF: Rcv DBD from 192.168.0.2 on FastEthernet0/0 seq 0xFDA opt 0x52 flag 0x7 len 32 mtu 1500 state EXSTART *Mar 1 00:59:47.467: OSPF: Nbr 192.168.0.2 has larger interface MTU Thor# *Mar 1 00:59:49.075: OSPF: Rcv hello from 192.168.0.2 area 0 from FastEthernet0/0 10.0.0.2 *Mar 1 00:59:49.075: OSPF: End of hello processing
You can see in the middle “Nbr 192.168.0.2 has larger interface MTU”. Pretty clear what is going on. On the other side it’s a similar story:
Loki#debug ip ospf events OSPF events debugging is on Loki# *Mar 1 01:01:41.479: OSPF: Neighbor change Event on interface FastEthernet0/0 *Mar 1 01:01:41.479: OSPF: DR/BDR election on FastEthernet0/0 *Mar 1 01:01:41.483: OSPF: Elect BDR 0.0.0.0 *Mar 1 01:01:41.483: OSPF: Elect DR 192.168.0.2 *Mar 1 01:01:41.483: DR: 192.168.0.2 (Id) BDR: none Loki# *Mar 1 01:01:48.063: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.2 *Mar 1 01:01:48.111: OSPF: Rcv hello from 192.168.0.1 area 0 from FastEthernet0/0 10.0.0.1 *Mar 1 01:01:48.111: OSPF: 2 Way Communication to 192.168.0.1 on FastEthernet0/0, state 2WAY *Mar 1 01:01:48.111: OSPF: Neighbor change Event on interface FastEthernet0/0 *Mar 1 01:01:48.111: OSPF: DR/BDR election on FastEthernet0/0 *Mar 1 01:01:48.115: OSPF: Elect BDR 0.0.0.0 *Mar 1 01:01:48.115: OSPF: Elect DR 192.168.0.2 *Mar 1 01:01:48.115: DR: 192.168.0.2 (Id) BDR: none *Mar 1 01:01:48.115: OSPF: Send DBD to 192.168.0.1 on FastEthernet0/0 seq 0x1603 opt 0x52 flag 0x7 len 32 *Mar 1 01:01:48.119: OSPF: Neighbor change Event on interface FastEthernet0/0 *Mar 1 01:01:48.119: OSPF: DR/BDR election on FastEthernet0/0 *Mar 1 01:01:48.119: OSPF: Elect BDR 0.0.0.0 *Mar 1 01:01:48.119: OSPF: Elect DR 192.168.0.2 *Mar 1 01:01:48.119: DR: 192.168.0.2 (Id) BDR: none *Mar 1 01:01:48.119: OSPF: Send immediate hello to nbr 192.168.0.1, src address 10.0.0.1, on FastEthernet0/0 *Mar 1 01:01:48.123: OSPF: Send hello to 10.0.0.1 area 0 on FastEthernet0/0 from 10.0.0.2 *Mar 1 01:01:48.123: OSPF: End of hello processing *Mar 1 01:01:48.143: OSPF: Rcv DBD from 192.168.0.1 on FastEthernet0/0 seq 0x261B opt 0x52 flag 0x7 len 32 mtu 80 state EXSTART *Mar 1 01:01:48.143: OSPF: Nbr 192.168.0.1 has smaller interface MTU *Mar 1 01:01:48.143: OSPF: First DBD and we are not SLAVE
“Nbr 192.168.0.1 has smaller interface MTU”.
And that’s it for today’s post. Debug messages are your friend. They make fault finding a million times easier than trying to get everything by trawling through running-configs.