Let’s break: OSPF

This is a bit of a resource for troubleshooting OSPF but breaking various aspects and see just how it fails.

To begin with, we have a simple OSPF setup with two routers working normally.

Basic_ospf

So that’s the setup, pretty simple. Just to confirm it’s working and show you the IP addresses:

Loki#sh ip ospf int brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 192.168.0.2/32 1 LOOP 0/0
Fa0/0 1 0 10.0.0.2/30 1 DR 1/1
Loki#show ip ospf neigh
Neighbor ID Pri State Dead Time Address Interface
192.168.0.1 1 FULL/BDR 00:00:34 10.0.0.1 FastEthernet0/0
Thor#show ip ospf int brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 192.168.0.1/32 1 LOOP 0/0
Fa0/0 1 0 10.0.0.1/30 1 BDR 1/1
Thor#show ip ospf neighb
Neighbor ID Pri State Dead Time Address Interface
192.168.0.2 1 FULL/DR 00:00:36 10.0.0.2 FastEthernet0/0

Simple enough.

Case 1: Wrong Subnet Mask

For our first issue, we will change the subnet mask on Thor to be a /29 (instead of the /30).

Thor(config)#int f0/0
Thor(config-if)#ip add 10.0.0.1 255.255.255.248
Thor(config-if)#
*Mar 1 00:14:59.331: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.0.2 on FastEthernet0/0 from FULL to DOWN, Neighbor Down: Interface down or detached

Straight away you see the neighbor go down. So how could you fault find this? Well, to begin with, you would probably do a show ip ospf int brief to give you some idea what’s going on:

Thor#show ip ospf int brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 192.168.0.1/32 1 LOOP 0/0
Fa0/0 1 0 10.0.0.1/29 1 DR 0/0

You can see that Thor thinks it’s the DR and there are no neighbors, so that’s not particularly helpful. If you tried to ping the neighbor address, you would see that you still have connectivity (so it’s not just that the neighbor is shut down or unplugged):

Thor#ping 10.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/22/32 ms

Alright, so let’s dig into some debugs. I generally try to keep the debug commands as specific as possible. For example, in this case I will use debug ip ospf hello rather then debug ip ospf packet. This is important particularly in operational networks as some debugs (let’s say, debug ip packet will absolutely hammer to crap out of your CPU).

Thor# debug ip ospf hello
OSPF hello events debugging is on
Thor#
*Mar 1 00:21:39.095: OSPF: Rcv hello from 192.168.0.2 area 0 from FastEthernet0/0 10.0.0.2
*Mar 1 00:21:39.095: OSPF: Mismatched hello parameters from 10.0.0.2
*Mar 1 00:21:39.095: OSPF: Dead R 40 C 40, Hello R 10 C 10 Mask R 255.255.255.252 C 255.255.255.248
*Mar 1 00:21:39.339: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.1

Once you get the next hello you can straight away see what’s up. First it tells you there are mismatched hello parameters – then it explicitly states the mask is wrong – “Mask R 255.255.255.252 C 255.255.255.248”

That was pretty easy. It is a similar process for mismatched dead and hello timers.

Case 2: Different Areas

This case is pretty easy to find too. After fixing up the subnet from the previous case, I changed the 10.0.0.1 interface to be in area 1. Again, neighborship goes down straight away. Show ip ospf niegh and show ip ospf int brief does not tell you much. In this case, you do not even need any debug statements:

Thor#
*Mar 1 00:28:59.087: %OSPF-4-ERRRCV: Received invalid packet: mismatch area ID, from backbone area must be virtual-link but not found from 10.0.0.2, FastEthernet0/0
Thor#show debug

As you can see, no debugs are on. You can see the neighbor believes it is in the backbone area (area 0) and the local router does not.

Now, the case is a little different on the other end. Loki – which still is in area 0. Loki is not generating any output. Even turning on debug ip ospf hello or debug ip ospf packet tells you nothing (except that you are sending packets out)

Loki#debug ip ospf packet
OSPF packet debugging is on
Loki#debug ip ospf hello
OSPF hello events debugging is on
Loki#
*Mar 1 00:34:58.063: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.2

In this case, the trick is to debug ip ospf events.

Loki#debug ip ospf events
OSPF events debugging is on
Loki#
*Mar 1 00:35:41.943: OSPF: Rcv pkt from 10.0.0.1, FastEthernet0/0, area 0.0.0.0
 mismatch area 0.0.0.1 in the header

And there it is. Mismatch area.

Case 3: MTU Mismatch

This is a slightly more interesting case. Firstly, when you change the interface MTU (in this case I changed it on Thor’s Fa0/0 to ip mtu 80) OSPF does not actually fail. So it is entirely possible you might make a change, not see OSPF fail, then leave it alone and everything will appear to be working fine. The failure won’t be seen until later (say, after a reset..these things being as they are, at 2am on Sunday morning).

To force the issue, I did a shut-no shut on Fa0/0 on Thor.

Once you start debugging you will see that the neighbor might not be down – it might be in EXSTART.

Thor#show ip ospf neigh
Neighbor ID Pri State Dead Time Address Interface
192.168.0.2 1 EXSTART/DR 00:00:38 10.0.0.2 FastEthernet0/0

Eventually, you will get a message saying OSPF has given up on life:

Thor#
*Mar 1 00:54:02.459: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.0.2 on FastEthernet0/0 from EXSTART to DOWN, Neighbor Down: Too many retransmissions

If you do a show ip ospf neighbor between that and the next hello message the neighbor will appear as down (rather than EXSTART).

The fact that OSPF does begin to establish indicates that all the parameters (hello timers, subnet mask, area etc) match and are fine. OSPF never getting any further than EXSTART is a huge hint that you should look at the interface MTUs. You can also debug ip ospf events to confirm:

Thor#debug ip ospf events
OSPF events debugging is on
Thor#
*Mar 1 00:59:47.459: OSPF: Send DBD to 192.168.0.2 on FastEthernet0/0 seq 0x13F2 opt 0x52 flag 0x7 len 32
*Mar 1 00:59:47.459: OSPF: Retransmitting DBD to 192.168.0.2 on FastEthernet0/0 [17]
*Mar 1 00:59:47.463: OSPF: Rcv DBD from 192.168.0.2 on FastEthernet0/0 seq 0xFDA opt 0x52 flag 0x7 len 32 mtu 1500 state EXSTART
*Mar 1 00:59:47.467: OSPF: Nbr 192.168.0.2 has larger interface MTU
Thor#
*Mar 1 00:59:49.075: OSPF: Rcv hello from 192.168.0.2 area 0 from FastEthernet0/0 10.0.0.2
*Mar 1 00:59:49.075: OSPF: End of hello processing

You can see in the middle “Nbr 192.168.0.2 has larger interface MTU”. Pretty clear what is going on. On the other side it’s a similar story:

Loki#debug ip ospf events
OSPF events debugging is on
Loki#
*Mar 1 01:01:41.479: OSPF: Neighbor change Event on interface FastEthernet0/0
*Mar 1 01:01:41.479: OSPF: DR/BDR election on FastEthernet0/0
*Mar 1 01:01:41.483: OSPF: Elect BDR 0.0.0.0
*Mar 1 01:01:41.483: OSPF: Elect DR 192.168.0.2
*Mar 1 01:01:41.483: DR: 192.168.0.2 (Id) BDR: none
Loki#
*Mar 1 01:01:48.063: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet0/0 from 10.0.0.2
*Mar 1 01:01:48.111: OSPF: Rcv hello from 192.168.0.1 area 0 from FastEthernet0/0 10.0.0.1
*Mar 1 01:01:48.111: OSPF: 2 Way Communication to 192.168.0.1 on FastEthernet0/0, state 2WAY
*Mar 1 01:01:48.111: OSPF: Neighbor change Event on interface FastEthernet0/0
*Mar 1 01:01:48.111: OSPF: DR/BDR election on FastEthernet0/0
*Mar 1 01:01:48.115: OSPF: Elect BDR 0.0.0.0
*Mar 1 01:01:48.115: OSPF: Elect DR 192.168.0.2
*Mar 1 01:01:48.115: DR: 192.168.0.2 (Id) BDR: none
*Mar 1 01:01:48.115: OSPF: Send DBD to 192.168.0.1 on FastEthernet0/0 seq 0x1603 opt 0x52 flag 0x7 len 32
*Mar 1 01:01:48.119: OSPF: Neighbor change Event on interface FastEthernet0/0
*Mar 1 01:01:48.119: OSPF: DR/BDR election on FastEthernet0/0
*Mar 1 01:01:48.119: OSPF: Elect BDR 0.0.0.0
*Mar 1 01:01:48.119: OSPF: Elect DR 192.168.0.2
*Mar 1 01:01:48.119: DR: 192.168.0.2 (Id) BDR: none
*Mar 1 01:01:48.119: OSPF: Send immediate hello to nbr 192.168.0.1, src address 10.0.0.1, on FastEthernet0/0
*Mar 1 01:01:48.123: OSPF: Send hello to 10.0.0.1 area 0 on FastEthernet0/0 from 10.0.0.2
*Mar 1 01:01:48.123: OSPF: End of hello processing
*Mar 1 01:01:48.143: OSPF: Rcv DBD from 192.168.0.1 on FastEthernet0/0 seq 0x261B opt 0x52 flag 0x7 len 32 mtu 80 state EXSTART
*Mar 1 01:01:48.143: OSPF: Nbr 192.168.0.1 has smaller interface MTU
*Mar 1 01:01:48.143: OSPF: First DBD and we are not SLAVE

“Nbr 192.168.0.1 has smaller interface MTU”.

And that’s it for today’s post. Debug messages are your friend. They make fault finding a million times easier than trying to get everything by trawling through running-configs.

 

This entry was posted in Debug, OSPF by Tom. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *