Article - My network is slow now what?
Author: Werner Schmidt, CISSP
Date: February 2009
It brings fear to the network manager, complaints from users or even worse upper management that the “network” is slow. It’s a complex challenge to determine if it is slow and where the slowness occurs. To best respond to the request, it’s important to get back to the basics:
- Have historical data to confirm or deny the statement
- Understand the details of the slowdown (when, where, what, who), it’s up to us to define the why
- Have tools in place for forensics (though it’s doubtful any of us in the commercial sector can afford full packet captures of all traffic)
Our toolset to deal with problems consists of:
- Counter based statistics (e.g. octets in/out) (typically via SNMP tools)
- Flow based information
- Service/port based information (tied to flows or even traffic data from firewalls)
- Application based information (true application visibility from next generation firewall solutions)
- Packet captures
- URL content filtering information
- Baseline historical information (typically charting or stats)
Counter based
Counter based information usually comes from switches, routers and sometimes firewalls. Typically this consists of some variation of octets or bytes sent/received and is usually at an interface or port level. This information is good for detecting a surge or race condition, but doesn’t lend itself to finding the source or destination (sometimes one end if we have counters on one of the endpoints) or service let alone the application (versus just the port). However, it can often be a good first blush for detecting where the surge condition is being observed or if there even is a surge condition.
SNMP is used by a variety of networking infrastructure and get be used to poll devices to get a variety of information. This may be as simple as counter information about bytes send/received or may include bandwidth stats and other relevant information.
Flow based
Flow data has started to become more commonplace. Unlike pure counters, flows can yield information about source, destination and service (port). Rather than just indicating a surge exists, it can help add visibility into the makeup of that surge. Problems do arise in sampling rates, granularity of time, determining users and true applications.
NetFlow from Cisco can be demanding for routers to process, therefore sampled NetFlow exists. Rather than looking at every packet, a router will look at every nth packet. On switches, sFlow is a standard that is used to poll traffic on a switch at some specified sample rate.
Jflow is used for flow data on Juniper Networks routers.
Standards and version features keep changing, but most flow based data consists of source IP address, destination IP address, source port of UDP or TCP, destination port for UDP or TCP, ingress interface and Type of Service (ToS). Flow data is typically sent from the collector to a management station via UDP or SCTP (in newer systems). If a flow packet is dropped due to network congestion, it is lost forever.
syslog
syslog is used by network infrastructure (routers, firewalls, IDS’s, IDP’s, other networking appliances), hosts and applications to log or report pertinent information. It may be an alert (interface down, login failed) or any variety of messages. Syslog tends to be used more for alerting versus traffic information,however it can be important in understanding network state changes or threat changes. Syslog can help add contextual awareness to what is happening in the network that might explain why a change has occurred (e.g. interface down or DoS).
Packet sniffing
Various sniffing tools exist in both free and commercial form. Tools such as wireshark can be indispensable for a packet view of network traffic and are the ultimate tool in troubleshooting once a particular source or destination IP address have been raised as a suspect. This is usually used in conjunction with port mirroring (aka span ports, mirror ports) on a switch for either a given port, ports or VLAN. It’s hard to argue the price and getting a packet capture for future analysis can be critical to troubleshooting a problematic networking or application problem.
Web filtering
Web filtering tools exist to dive deeper into web (http and https) traffic to determine what users are involved and what type of web sites are being accessed (by category, e.g. finance, entertainment, etc.). This is one of the few tools that maps back users and type of site as being accessed.
Solutions
Unfortunately there is no one size fits all approach. The key first is to understand the nature of the problem. Does it involve just the local network (LAN), does it involve another resource across the WAN (either MPLS or Internet) and if possible determine or have tools in place to ascertain if others were affected.
One approach is a top down approach based upon comparing current state traffic via counters displayed on graphs to determine if the current state is unusual as compared to historical baselines. If so, then dive deeper into flow data to better understand the makeup of the new higher traffic rates and ultimately packet sniffing once a congestion point has been determined. So we start with counters, dive into flows and then perhaps into captures.
It’s also important to understand the difference between bandwidth, latency and packet loss though this article won’t go into how these factors play into the analysis. Historical tracking of bandwidth utilization, latency and packet loss is important to add us another dimension and can help explain problems that aren’t due just to a surge condition, but perhaps congestion or delays.
Altaware, Inc. offers a variety of tools based upon the technologies mentioned in this article. We offer:
- Manageable switches from ProCurve, Juniper Networks and Netgear
- Routers with flow data from Juniper Networks
- Management solutions including:
- STRM from Juniper Networks for syslog, flow data and NBAD (Network Behavior Anomaly Detection)
- Orion from SolarWinds with or without flow based tools and a variety of networking tools
- Counter based views from Statseeker
- Open Source Management NMS (Network Management System) tailored and deployed by Altaware, Inc. very affordable and recommended as a good starting point or central NMS.
- Next generation firewalls with application and user visibility from Palo Alto Networks
- IPAM (IP Address Management) from Infoblox
- High performance syslog solutions from LogLogic (appliance or virtual based)
- Network usage analysis software from Congruity
- Wireless networking information from AirWave now part of Aruba Networks
- URL filtering and usage reporting from Websense, Barracuda or FaceTime
- MPLS solutions that include circuit utilization and traffic prioritization capability from circuit providers
- Identity tracking solutions to match user IDs back to IPs from A10 Networks
Read the rest of the articles. Please contact us for more information!