Go to the top of the NLANR/DAST web site

AAD | Advisor | Autobuf v2.0 | Multicast Beacon | BIMA | Iperf | NextINet | Tools | Web100 | All Projects


Search this site with Google

About:
- DAST
- NLANR
- FAQ
- Staff
- Contact DAST

End User Tools and Projects
- NextINet
- Advanced Applications
Database

- DAST Projects/Tools
- Network Performance
and Measurement Tools

End User Support
- Getting Started Guide
- Networking Glossary
- Other Projects/Organizations
- Funding Opportunities

Documents
- Guides/Tutorials
- Papers/Articles
- Presentations
- Reference Books

WebCT Courses
- Tuning Applications

Events
- NLANR/DAST Training
- NLANR Packets Calendar
- Idesk Travel Schedule

News
- Press Releases
- Alliance Data Link
- I2 Newswire Archives

Reports & Statistics
- Monthly Updates and QSRs
- Abilene "Weather Map"
- Web Server Stats

Contents

Introduction

Connections

Performance

Methods

Examples

Resources

Glossary

Getting Started Guide
Network Performance

  1. What is Network Performance?
  2. Basics of Networking
    1. The TCP/IP Protocols
    2. MTU
  3. Measuring Network Performance
    1. Backbone Performance
    2. End-to-end Performance
  4. Application Performance
    1. Instrumenting Applications for Good Network Performance
    2. Measuring Application Performance
    3. Tuning TCP Application Performance
    4. Tuning UDP Application Performance
    5. Followup
  5. Future Directions
    1. Grid Services
    2. User Portals
    3. Automatic Tuning

1. What is Network Performance?

Network performance refers to the overall effectiveness of a network at a given point. Generally performance is examined at all levels of connectivity (LAN, WAN, backbone, end-to-end, application). Difference aspects of network performance can be measured, giving you information you can use to improve your application's performance.

Measurement usually looks at one or more of the following:

Bandwidth
-- how much data can be transferred per unit time -- is the most obvious.
Delay
-- how long it takes an individual piece of data to traverse the network -- is important for real-time applications like video conferencing and remote instrument control, and also impacts bandwidth.
Packet loss
-- when a piece of data disappears in transmission -- affects both bandwidth and real-time applications.
A high-performance network is characterized by high bandwidth, small delay, and low packet loss.
 

2. Basics of Networking

2.1. The TCP/IP Protocols

TCP/IP is a family of network protocols that has gained general acceptance over nearly 20 years. It is responsible for most of the tools you use everyday to browse web pages, download files, and use ping to check other hosts. Below are discussed the three most general protocols -- IP, UDP, and TCP -- and  some simple client/server examples are provided.

The development of TCP/IP created the notion of layered protocols. IP (Internet Protocol), the most basic protocol of TCP/IP, provides the minimum requirements to route a message and move it from one machine to another over a network interface. For example, the ping command queries the network availability of remote hosts using IP (or raw sockets). Sockets are the end points of a connection and are created with and used by function calls. The Connections Section provides more information on network connections. Application developers typically don't program at the IP (Internet Protocol) level because IP is too low-level and does not meet general reliability requirements that most distributed applications have.

The User Datagram Protocol (UDP) provides a bit more functionality than using raw sockets and is a very basic transport protocol built on top of IP. Because UDP does not guarantee that messages will reach their final destinations, UDP is used for certain applications like streaming audio and video where packet loss recovery is not as important. The loss of a packet or two in a whole series of packets comprising a video image does not greatly change the end user's experience.

Like UDP, the Transmission Control Protocol (TCP) is also layered on top of IP but provides more reliabilty by ensuring that packets that are dropped get retransmitted and reach their destination in the order they were sent. TCP addresses the following transport issues:

  • handshaking mechanisms to establish a connection between two machines
  • capabilities for flow control (how much data can be sent at a time)
  • congestion control (what to do when packets are dropped)
  • polling for messages (how long to wait for an incoming packet)
  • retransmission of lost or corrupted data
TCP/IP specifies a sockets API that lets developers write their own network code to transmit data using the TCP or UDP protocol. Servers can be designed that process requests from clients and return updated information. The steps involved in building a server are:
  1. initialize the socket
  2. bind the socket to a chosen port greater than 1024
  3. listen for incoming connections
  4. accept (or not) any incoming requests
In order to connect to the server, the client must:
  1. initialize a socket
  2. connect to the server
Once a connection is established, client and server can read and write data. See Examples  for some examples.

Unfortunately, sockets programming is still too low-level for developers not intimately familiar with some of the more esoteric and finer details of TCP/IP and network programming. For this reason, developers typically rely on distributed computing frameworks, or middleware, discussed below to more effectively enable application development. See W. Richard Stevens, Unix Network Programming, Vol. 1 for more information.

2.2 MTU

Maximum transmission unit (MTU) is the largest packet size that can be sent across ia given network. For ethernet, the MTU is 1500 bytes. The MTU may vary from network to network; the Path MTU is the largest packet size that can be sent across an entire network path. Path MTU Discovery is the algorithm used to find the Path MTU.

The actual amount of data in each packet is smaller than the MTU. For TCP, this amount of data is called the Maximum Segment Size (MSS) and is about 40 bytes less than the MTU. For UDP, data will be about 28 bytes less than the MTU.

The U.S. Postal Service has a maximum size for packages it will accept; this is similar to the MTU. If you are sending a package internationally, the foreign postal service may have a different maximum size. You can then only send a package as large as both postal services will accept, similar to the Path MTU. If you use something to cushion an item, the item itself must be smaller than the box, just as the data in a packet is smaller than the MTU.
 

3. Measuring Network Performance

3.1 Backbone Performance

Several groups provide direct, systematic measurement of wide-area network backbone performance. The vBNS and Abilene networks use SNMP polling and MRTG graphs (see explanations below) to examine the traffic through routers and switches. NLANR Measurement and Network Analysis utilizes both passive and active measurements. Advanced Network & Services, Inc. runs the Surveyor project of active measurements.

For direct assistance in understanding the backbone network performance information, contact the measurement and operations team. Generally when problems are revealed by the backbone measurements, local network engineers should verify the problem is not with their configuration, and then contact either NLANR Engineering Services, the Abilene NOC (network operations center), or the vBNS NOC.

If you are a researcher and are experiencing problems with your backbone or connection to the backbone, talk with your local network engineer prior to contacting the resources above. Your network engineers may have the answer you need. Local engineers are usually affiliated with centralized computing services offices.

SNMP and MRTG data

Simple network management protocol (SNMP) is used to gather statistics from routers and switches, including the number and size of IP packets, total bytes, router CPU utilization, and discarded packets.

Multirouter traffic grapher (MRTG) is used to display bandwidth usage and other information over time. Web accessible charts are updated every 5 minutes and show data summarized from the past day, week, month, and year. Abilene maintains an MRTG page.

OCXmon Passive Monitoring

OC3mon and OC12mon machines are used to passively examine network traffic without introducing any traffic of their own. The machines tap into the OC3 and OC12 (optical carrier level) fiber optic line connecting to the wide-area network and analyze flows traversing the network. These machines currently work only with ATM (asynchronous transfer mode) networks, but POS (packet over SONET) is in the works. The vBNS has OC12mon machines at each node. NLANR maintains OC3mon and OC12mon machines at various institutions.

Active Measurement Program

The NLANR Active Measurement Program (AMP) tests bandwidth and delay between participating institutions. An AMP machine is installed at each institution. Tests are run between each pair of machines, forming a full mesh. A ping test, measuring round-trip time delay, is run every minute. AMP also runs traceroute tests every 10 minutes to show what networks are used between institutions. Maximum TCP (transmission control protocol) bandwidth tests can be run on demand.

Surveyor

Surveyor is a project similar to AMP run by Advanced Network & Services, Inc. One-way delays are measured between Surveyor machines at participating institutions. GPS (Global Positioning System) is used to synchronize the machine clocks so one-way delays can be computed accurately to within 50 microseconds. Using one-way delays, asymmetries in the network are revealed that normal round-trip time delays do not.

3.2 End-to-End Performance

NLANR Engineering Services and the NLANR Distributed Applications Support Team provide tools for analyzing the end-to-end network performance along particular network paths. Some tools are intended for use by network engineers and some are more appropriate for use by researchers. The tools range from adjusting parameters of the TCP protocol to investigating difficiencies in routers and the operating system's TCP implementation.

For more assistance with optimizing a campus network, or finding the bottlenecks along particular paths, contact NLANR Engineering Services team. For help with a research project, contact the NLANR Distributed Applications Support Team.

Treno "TCP Reno"

Treno emulates the TCP protocol stack using UDP (user datagram protocol). You can use this to compare an operating system's TCP implementation with a modern TCP implementation that includes such improvements as SACK (selective acknowledgement), FACK (forward acknowledgement), and Path MTU Discovery. Treno also allows targeting individual routers along the path, to discover what links are problematic. Not for the faint-of-heart, but very valuable to network engineers.

mping

mping stresses the network, intentionally flooding the router queues to test queuing properties. Using mping, you can find, for example, the bandwidth and packet loss as the TCP window size increases. Again, this tool is intended for network engineers skilled at interpreting the output.

Iperf

Iperf measures the maximum TCP bandwidth and the UDP performance between two machines. It is useful to tune the TCP window size and gives a baseline to compare application performance against. It can also measure packet loss and variation in delay (jitter). Many older but similar tools such as ttcp also exist. Iperf is a useful tool for non-network engineers.

traceroute, ping, mtr

traceroute
is used to find the path your data takes through the network. This can be important to verify that you are using a high-performance network. (See Chapter 2 for additional information.) traceroute also provides round-trip time measurements to each router.
ping
will repetitively find round-trip time measurements to a particular machine or router. The repetition helps to reveal changes in delay. Both ping and traceroute are standard UNIX commands, often found in /sbin, /usr/sbin, /etc, or /usr/etc.
mtr
("Matt's TraceRoute") is a program that combines the functionality of traceroute and ping and presents the output data in an easy-to-read tabular format. It repetitively pings each router along the path, showing delay and packet loss.
Of course, issuing repetitive ping commands should only be done in the course of investigating specific questions you have about a connection and only for a short period of time (e.g., one or two minutes).

tcpdump, tcptrace, xplot

tcpdump
is a standard UNIX utility to examine or "sniff" the traffic on the network. It is similar in functionality to the OCXmon passive monitoring machines, but is typically run to only look at a specific connection.
tcptrace
can be used to analyze the output from tcpdump, and xplot will show the packets graphically. This is very helpful (for those familiar with TCP's internal workings) to diagnose network problems.
xplot
is helping to reveal "pathological" network behaviors. Staff at the Pittsburgh Supercomputing Center are developing a debugging flowchart using xplot to show these behaviors.

All these tools are available from NLANR Engineering Services as a preconfigured testrig, which means that the software is prebuilt and preconfigured.

4. Application Performance

4.1 Instrumenting Applications for Good Network Performance

Measuring performance in an application involves instrumenting the application. At the very basic level, instrumenting means keeping track of how long a transfer takes and computing bandwidth from that. For instance, FTP clients typically report the size, duration, and average bandwidth after each file transfer. Real-time applications often keep track of delay, variations in delay, and packet loss.

Application performance instrumentation can be used both for tuning an application and for responding dynamically to network conditions. For instance, a video conference application may notice high packet loss and reduce the bandwidth it is using, or notice large variations in delay and increase the amount of buffering.

The manner in which an application adjusts to the network conditions is very application specific. In a video-conferencing software it may involve manually adjusting the sending rate if loss is too high. In FTP, you might pick a different FTP server. The Internet 2 Distributed Storage Infrastructure (I2-DSI) service uses heuristics to automatically pick the server that will be best for you. In audio conferencing, one might give up and use the telephone. (Seriously, we've done this before!)

Netlog

Netlog is a library for instrumenting socket networking in C/C++ applications. It logs periodic bandwidth results, giving a fine-grained picture of how an application interacts with the network. It requires minimal source code changes, usually just adding a header file. Data can be logged to a file or over a network connection, and analyzed by Viznet. A Java version is also being developed. Check the NLANR Distributed Applications Tools page and the NLANR Distributed Applications Projects page for release information.

Viznet

Viznet, which is implemented in Java, visualizes the raw performance data that Netlog provides. It draws a graph of bandwidth vs. time for each socket connection and allows interactive scaling and scrolling of the graph. Viznet can read recorded Netlog files or live Netlog data over the network.

Real-time Transport Protocol

Real-time transport protocol (RTP) is used by many real-time applications (e.g., video conferencing) to monitor and respond to network conditions. It detects delay, delay jitter, and packet loss. RTP uses RTCP (RTP control protocol) to give performance feedback to the sending application. A good overview of RTP is given by Henning Schulzrinne, a professor at Columbia University.

4.2 Measuring Application Performance

After instrumenting an application, you can start to measure the network performance it gets. From there you can tune the application or followup with network engineers to fix problems evident in the network. Follow these guidelines to determine your application's performance.
  1. Start measuring application performance in a local, controlled environment. The local network is much simpler to debug, because there are fewer variables and only one network engineer to contact if performance is poor. Do some initial tuning at this point (see next section immediately below) to give a good baseline to compare against later. Remember that an application that does not perform well locally will not perform well over the wide-area.
  2. Once the performance is good in the local network, move on to measuring performance in the wide-area network. Use Iperf to measure the achievable bandwidth or other performance characteristic your application requires. Run your application and compare its performance to the performance in the local network and also to the performance you get using Iperf. Tune the application and compare results with earlier results.

4.3 Tuning TCP Application Performance

The TCP window size is by far the most important parameter to adjust for achieving maximum bandwidth across high-performance networks. Properly setting the TCP window size can often more than double the achieved bandwidth. See the User's Guide to TCP Windows for details.

Another parameter to check is the MTU (maximum transmission unit). MTU controls what the largest packet size can be. If the operating system supports Path MTU Discovery, the largest packet size will be the largest MTU all of the intermediate networks support (often Ethernet's 1,500 byte MTU). Without Path MTU Discovery, it will often be 576 bytes. A small MTU wastes time processing many small packets instead of fewer large ones. The system administrator can enable Path MTU Discovery if the operating system implements it. Information available on the Pittsburgh Supercomuting Center's website on enabling high-performance data transfers on hosts lists which operating systems support Path MTU Discovery and how to enable it.

Use Iperf with the -m (print MSS) option to check the MTU. Bellow shows a host that doesn't support Path MTU Discovery. It will only send and receive small 576 byte packets.

oceana> iperf -s -m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[  4] local 172.20.40.200 port 5001 connected with 172.16.7.4 port 13914
[ ID] Interval       Transfer    
Bandwidth
[  4]  0.0- 2.3 sec   632 KBytes   2.1 Mbits/sec
WARNING: Path MTU Discovery may not be enabled.
[  4] MSS size 536 bytes (MTU 576 bytes, minimum)
[  4] Read lengths occurring in more than 5% of reads:
[  4]   536 bytes read   308 times (58.4%)
[  4]  1072 bytes read    91 times (17.3%)
[  4]  1608 bytes read    29 times (5.5%)
For applications that intentionally send small chunks of data immediately, setting the TCP no delay option may improve performance. Interactive applications, such as telnet, often fall into this category. Normally TCP queues small writes to send out larger packets. This queueing can sometimes be undesirable. Imagine if you had to type 1000 characters in telnet before a single one was displayed! To set TCP no delay, use setsockopt:
int nodelay = 1;
int error = setsockopt( socket, IPPROTO_TCP, TCP_NODELAY, &nodelay,
sizeof(nodelay));
Note the TCP no delay option does not apply to UDP-based applications, which always send data immediately.

4.4 Tuning UDP Application Performance

For UDP applications, the burstiness of the traffic can be an issue. An application may send a large burst of packets back-to-back, followed by some idle time. The average bandwidth looks reasonable, but the bandwidth during the burst is excessive. Burstiness is the result of two different parameters: how much time occurs between writes and how large individual datagram writes are.

If there is very little delay between writes, the application may create a burst causing too much stress on the network. Spacing out writes -- by putting in a sleep for instance -- reduces this stress and thus reduces packet loss. Instrumenting your application with Netlog can show the timing of your writes, as can looking at a packet trace from tcpdump.

If the datagram size is large, it will be broken up into separate packets which will then be sent as a single burst. This is actually worse than not spacing out writes, because if a single packet of the datagram is lost, the entire datagram must be discarded. So not only is there a burst that causes network stress, but the effects of the resulting packet loss are also magnified. Delay and jitter are also increased, because more time is spent fragmenting the datagram into separate packets and reassembling it again.

Iperf can be used to simulate streams with different datagram write sizes. Here is an example of sending 1470 byte datagrams (each 1 packet) and then 32 KB datagrams (each 23 packets). Notice the much higher delay jitter (third column from the right) and percent packet loss (right column) caused by using large datagrams.

cyclops> iperf -s -u -i 1 -l 1470
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 60.0 KByte (default)
------------------------------------------------------------
[  3] local 172.20.178.168 port 5001 connected with 172.16.7.4 port 9571
[ ID] Interval       Transfer    
Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0- 1.0 sec   4.4 MBytes  35.2 Mbits/sec  0.137
ms    0/ 3135 (0%)
[  3]  1.0- 2.0 sec   4.4 MBytes  34.9 Mbits/sec  0.213
ms    0/ 3117 (0%)
[  3]  2.0- 3.0 sec   4.4 MBytes  34.9 Mbits/sec  0.429
ms    1/ 3123 (0.032%)
[  3]  3.0- 4.0 sec   4.4 MBytes  35.2 Mbits/sec  0.150
ms    2/ 3136 (0.064%)
[  3]  4.0- 5.0 sec   4.4 MBytes  35.0 Mbits/sec  0.139
ms    0/ 3122 (0%)
[  3]  5.0- 6.0 sec   4.4 MBytes  35.0 Mbits/sec  0.178
ms    0/ 3125 (0%)
[  3]  6.0- 7.0 sec   4.4 MBytes  35.1 Mbits/sec  0.185
ms    0/ 3128 (0%)

[  3]  7.0- 8.0 sec   4.4 MBytes  35.0 Mbits/sec  0.248
ms    1/ 3121 (0.032%)
[  3]  8.0- 9.0 sec   4.4 MBytes  35.1 Mbits/sec  0.141
ms    0/ 3131 (0%)
[  3]  0.0-10.0 sec  43.8 MBytes  35.1 Mbits/sec  0.119
ms    4/31252 (0.013%)

cyclops> iperf -s -u -i 1 -l 32k
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 32768 byte datagrams
UDP buffer size: 60.0 KByte (default)
------------------------------------------------------------
[  3] local 172.20.178.168 port 5001 connected with 172.16.7.4 port 9534
[ ID] Interval       Transfer    
Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0- 1.0 sec   4.3 MBytes  34.4 Mbits/sec  1.036
ms    3/  141 (2.1%)
[  3]  1.0- 2.0 sec   4.3 MBytes  34.7 Mbits/sec  0.847
ms    1/  140 (0.71%)
[  3]  2.0- 3.0 sec   4.3 MBytes  34.8 Mbits/sec  1.093
ms    1/  140 (0.71%)
[  3]  3.0- 4.0 sec   4.3 MBytes  34.7 Mbits/sec  0.688
ms    1/  140 (0.71%)
[  3]  4.0- 5.0 sec   4.3 MBytes  34.2 Mbits/sec  0.479
ms    3/  140 (2.1%)
[  3]  5.0- 6.0 sec   4.3 MBytes  34.8 Mbits/sec  0.384
ms    1/  140 (0.71%)
[  3]  6.0- 7.0 sec   4.4 MBytes  35.0 Mbits/sec  0.939
ms    0/  140 (0%)
[  3]  7.0- 8.0 sec   4.4 MBytes  35.0 Mbits/sec  0.691
ms    0/  140 (0%)
[  3]  8.0- 9.0 sec   4.3 MBytes  34.5 Mbits/sec  0.871
ms    2/  140 (1.4%)
[  3]  9.0-10.0 sec   4.3 MBytes  34.6 Mbits/sec  0.796
ms    1/  140 (0.71%)
[  3]  0.0-10.0 sec  43.4 MBytes  34.7 Mbits/sec  0.988
ms   13/ 1402 (0.93%)

4.5 Followup

Where do you need to followup if your application performance is poor?
  1. Start by checking the application performance sections above. Check also with the local network administrator to verify that the campus network is optimized. Sometimes the campus is connected to a high-performance network such as Abilene, but your desktop machine is not! The local network administrator can verify that you are using the high-performance network and that the campus network infrastructure is sufficient to meet your needs. They may also discover that the problem is in a regional gigaPoP, so they should contact the gigaPoP operators to solve problems there.
  2. For further assistance with measuring and improving the performance of an application, contact the NLANR Distributed Applications Support Team who can work with your application to tune it and the underlying networks to meet your needs.

5. Future Directions

5.1 Grid Services

The Network Weather Service (NWS) and Globus Gloperf run servers on various supercomputers that test bandwidth and latency between machines. This information is published with the Globus MDS (metacomputing directory service). This provides an easy way to find up-to-date information about bandwidth and latency without having to run a test of your own. The NWS provides a prediction model, so it can give an estimate of available bandwidth based on past performance. It also measures other performance factors such as CPU load and free memory.

5.2 User Portals

The Alliance is working on user portals, web pages that contain all the commonly accessed information customized for an individual user. A portion of a user portal is expected to be network performance information, so all the measurements you need on a day-to-day basis are at your fingertips. Bandwidth and delay tests from your machine to remote sites can be performed, as well as querying for bandwidth and latency between various supercomputers.

5.3 Automatic Tuning

The NLANR teams and the Web100 group are discussing automated tuning of application performance. It is possible to have the operating system figure out the optimal TCP window size and set it, without requiring you to manually tune an application. NLANR is working on a version of FTP that automatically tunes itself for each connection. Check the Tools page and the NLANR distributed applications Projects page for release information.


Contents

Introduction

Connections

Performance

Methods

Examples

Resources

Glossary


Contact DASTBlank Space Last reviewed: December 31, 1969
NLANR || Applications Support || Engineering Support || Measurement and Network Analysis