We recently discovered what was causing some of the connection issues over the WiMAX connection in Malawi.

There were two issues, but one sort of builds on the other.  The first issue occurs when someone accesses a publically accessible service through the IP from the secondary ISP, Tonse.  We have two routers at this site.  A Cisco 1800 series router is connected to the main satellite connection.  We’re using a DLink VPN router on the Tonse connection, for a site to site VPN to another DLink VPN router at a satellite clinic.

The Cisco router is set as the default gateway on the LAN at the main clinic site.  The Cisco router has no information

It only see’s the outgoing traffic, not any of the incoming traffic that goes through Tonse and the DLink router.

image

  1. Users browses to AtMail webmail using Tonse address.

To: 2.2.2.2 From: 5.5.5.5

  1. DLink translates destination IP address to internal server.

To: 10.1.1.1 From: 5.5.5.5

  1. Internal server replies, sending traffic to default gateway (Cisco router).

To: 5.5.5.5 From: 10.1.1.1

  1. Cisco router has no record of incoming connection, translates source address to Ariave IP.

To: 5.5.5.5 From: 1.1.1.1

  1. End user’s computer receives the response, but it is not from the address it connected to. The TCP/IP stack does not see this is a response to the TCP connection request, so it discards it and the connect is never established.

This is rather ugly because the Cisco router sends traffic out with a source IP address that doesn’t belong to it (and, is in fact on a completely separate network and service provider from the one it is connected to).  It is also rather ugly because it is unclear (at least to me), how the Cisco router is choosing which NAT entry to use for translating the outgoing traffic.A ugly work-around for this issue is to add NAT entries on the default gateway (the Cisco router in this case).  So, in addition to have the service NAT’ed to the public IP the router itself uses, it also has a second entry in the NAT table connecting the same internal IP / port to the public IP the other router/connection uses.  This appears to cause the Cisco router to to translate the source address of the outgoing traffic to the public address of the other router/connection if it is not part of an existing connection that already came in through the Cisco router.

Doing this however caused the second issue, which I’ll post about in part 2.  It’s a great example of why you shouldn’t do things the “ugly way” if at all possible.  Stay tuned!