One of the books I received as a gift this Christmas was Network Flow Analysis. It introduced me to a bundle of Netflow related tools I hadn’t worked with before, flow-tools. Previously, I have used nfdump. Perhaps its just because of how it was introduced to me, but nfdump seems to be better suited for ad-hoc monitoring, rather than continually running monitoring. Flow-tools on the other hand provides a handy script in /etc/init.d (at least on CentOS / Fedora). Its easy to run it as a daemon. The only tricky part of the install process was finding the configuration file for the daemon. It is located at /etc/sysconfig/flow-capture in CentOS/Fedora, rather than in /etc/flow-tools/.

By simply running running the daemon on a Linux server at a site, and pointing the Cisco router to export its Netflow data to the server, I can store a history of network connections. If we see an unusual period of heavy traffic, or a user complains about slow performance, we can go back and see what happened in each five minute increment. By allowing us to find what the culprit was after-the-fact, this makes it much easier to troubleshoot rare or transient network performance issues. Instead of guessing, or making excuses this allows a network administrator to dig into the details and determine what was actually happening at a network level.

When combined with our Squid proxy server logs, we can determine what URL’s the user was visiting if it was web traffic causing the slow-down. Netflow will just show the destination IP and port the traffic was going to. In many cases, this isn’t enough to determine what actual website the user was visiting. If the user was downloading a large file, there’s a good chance it was hosted on a CDN (Content Delivery Network). In this case, the IP will belong to the CDN, its reverse DNS address will likely be related to the CDN, and browsing to that specific IP address in a web browser probably won’t give much in the way of clues. The proxy server log files can be searched for the specific IP, to determine what URL / website the user was actually accessing.

So far I’ve been pleased. From seeing a period of high bandwidth usage we were able to track down someone downloading a bunch of MP3’s (over the clinic’s slow and high latency satellite connection). A couple additions to the Squid configuration took care of that.