About:
- DAST
- NLANR
- FAQ
- Staff
- Contact DAST
End User Tools and Projects
- NextINet
- Advanced Applications
  Database
- DAST Projects/Tools
- Network Performance
  and Measurement Tools
End User Support
- Getting Started Guide
- Networking Glossary
- Other Projects/Organizations
- Funding Opportunities
Documents
- Guides/Tutorials
- Papers/Articles
- Presentations
- Reference Books
WebCT Courses
- Tuning Applications
Events
- NLANR/DAST Training
-
NLANR Packets Calendar
- Idesk Travel Schedule
News
- Press Releases
- Alliance Data Link
- I2 Newswire Archives
Reports & Statistics
- Monthly Updates and QSRs
-
Abilene "Weather Map"
- Web Server Stats
|
Getting Started Guide
Network Performance
-
What is Network Performance?
-
Basics of Networking
-
The TCP/IP Protocols
-
MTU
-
Measuring Network Performance
-
Backbone Performance
-
End-to-end Performance
-
Application Performance
-
Instrumenting Applications for Good Network Performance
-
Measuring Application Performance
-
Tuning TCP Application Performance
-
Tuning UDP Application Performance
-
Followup
-
Future Directions
-
Grid
Services
-
User
Portals
-
Automatic
Tuning
1. What is Network Performance?
Network performance refers to the overall effectiveness of a network
at a given point. Generally performance is examined at all levels of connectivity
(LAN, WAN, backbone, end-to-end, application). Difference aspects of network
performance can be measured, giving you information you can use to improve
your application's performance.
Measurement usually looks at one or more of the following:
-
Bandwidth
-
-- how much data can be transferred per unit time -- is the most obvious.
-
Delay
-
-- how long it takes an individual piece of data to traverse the network
-- is important for real-time applications like video conferencing and
remote instrument control, and also impacts bandwidth.
-
Packet loss
-
-- when a piece of data disappears in transmission -- affects both bandwidth
and real-time applications.
A high-performance network is characterized by high bandwidth, small delay,
and low packet loss.
2. Basics of Networking
2.1. The TCP/IP Protocols
TCP/IP is a family of network protocols that has gained general acceptance
over nearly 20 years. It is responsible for most of the tools you use everyday
to browse web pages, download files, and use ping to check other
hosts. Below are discussed the three most general protocols -- IP, UDP, and
TCP -- and some simple client/server examples
are provided.
The development of TCP/IP created the notion of layered protocols. IP
(Internet Protocol), the most basic protocol of TCP/IP, provides the minimum
requirements to route a message and move it from one machine to another
over a network interface. For example, the ping command queries
the network availability of remote hosts using IP (or raw sockets). Sockets
are the end points of a connection and are created with and used by function
calls. The Connections Section provides
more information on network connections. Application developers
typically don't program at the IP (Internet Protocol) level because IP
is too low-level and does not meet general reliability requirements that
most distributed applications have.
The User Datagram Protocol (UDP) provides a bit more functionality than
using raw sockets and is a very basic transport protocol built on top of
IP. Because UDP does not guarantee that messages will reach their final
destinations, UDP is used for certain applications like streaming audio
and video where packet loss recovery is not as important. The loss of a packet or
two in a whole series of packets comprising a video image does not greatly
change the end user's experience.
Like UDP, the Transmission Control Protocol (TCP) is also layered on
top of IP but provides more reliabilty by ensuring that packets that are
dropped get retransmitted and reach their destination in the order they
were sent. TCP addresses the following transport issues:
-
handshaking mechanisms to establish a connection between two machines
-
capabilities for flow control (how much data can be sent at a time)
-
congestion control (what to do when packets are dropped)
-
polling for messages (how long to wait for an incoming packet)
-
retransmission of lost or corrupted data
TCP/IP specifies a sockets API that lets developers write their own network
code to transmit data using the TCP or UDP protocol. Servers can be designed
that process requests from clients and return updated information. The
steps involved in building a server are:
-
initialize the socket
-
bind the socket to a chosen port greater than 1024
-
listen for incoming connections
-
accept (or not) any incoming requests
In order to connect to the server, the client must:
-
initialize a socket
-
connect to the server
Once a connection is established, client and server can read and write
data. See Examples for some examples.
Unfortunately, sockets programming is still too low-level for developers
not intimately familiar with some of the more esoteric and finer details
of TCP/IP and network programming. For this reason, developers typically
rely on distributed computing frameworks, or middleware, discussed
below to more effectively enable application development. See W. Richard
Stevens,
Unix
Network Programming, Vol. 1 for more information.
2.2 MTU
Maximum transmission unit (MTU) is the largest packet size that
can be sent across ia given network. For ethernet, the MTU is 1500 bytes. The
MTU may vary from network to network; the Path MTU is the largest
packet size that can be sent across an entire network path. Path MTU Discovery
is the algorithm used to find the Path MTU.
The actual amount of data in each packet is smaller than the MTU. For
TCP, this amount of data is called the Maximum Segment Size (MSS)
and is about 40 bytes less than the MTU. For UDP, data will be about 28
bytes less than the MTU.
The U.S. Postal Service has a maximum size for packages it will accept;
this is similar to the MTU. If you are sending a package internationally,
the foreign postal service may have a different maximum size. You can then
only send a package as large as both postal services will accept, similar
to the Path MTU. If you use something to cushion an item, the item itself
must be smaller than the box, just as the data in a packet is smaller than
the MTU.
3. Measuring Network Performance
3.1 Backbone Performance
Several groups provide direct, systematic measurement of wide-area network
backbone performance. The vBNS and Abilene networks use SNMP polling and
MRTG graphs (see explanations below) to examine the traffic through routers
and switches. NLANR Measurement and
Network Analysis utilizes both passive and active measurements.
Advanced Network & Services, Inc. runs the Surveyor project of active
measurements.
For direct assistance in understanding the backbone network performance
information, contact the measurement and operations team. Generally when
problems are revealed by the backbone measurements, local network engineers
should verify the problem is not with their configuration, and then contact
either NLANR Engineering Services, the
Abilene NOC (network
operations center), or the vBNS NOC.
If you are a researcher and are experiencing problems with your backbone
or connection to the backbone, talk with your local network engineer prior
to contacting the resources above. Your network engineers may have the
answer you need. Local engineers are usually affiliated with centralized
computing services offices.
SNMP and MRTG data
Simple network management protocol (SNMP) is used to gather statistics
from routers and switches, including the number and size of IP packets,
total bytes, router CPU utilization, and discarded packets.
Multirouter traffic grapher (MRTG) is used to display bandwidth
usage and other information over time. Web accessible charts are updated
every 5 minutes and show data summarized from the past day, week, month,
and year. Abilene
maintains an MRTG page.
OCXmon Passive Monitoring
OC3mon and OC12mon machines are used to passively examine network traffic
without introducing any traffic of their own. The machines tap into the
OC3 and OC12 (optical carrier level) fiber optic line connecting to the
wide-area network and analyze flows traversing the network. These machines
currently work only with ATM (asynchronous transfer mode) networks, but
POS (packet over SONET) is in the works. The vBNS
has OC12mon machines at each node. NLANR
maintains OC3mon and OC12mon machines at various institutions.
Active Measurement Program
The NLANR Active Measurement Program (AMP)
tests bandwidth and delay between participating institutions. An AMP machine
is installed at each institution. Tests are run between each pair of machines,
forming a full mesh. A ping test, measuring round-trip time delay,
is run every minute. AMP also runs traceroute tests every 10 minutes
to show what networks are used between institutions. Maximum TCP (transmission
control protocol) bandwidth tests can be run on demand.
Surveyor
Surveyor is a project similar
to AMP run by Advanced Network & Services, Inc. One-way delays are
measured between Surveyor machines at participating institutions. GPS (Global
Positioning System) is used to synchronize the machine clocks so one-way
delays can be computed accurately to within 50 microseconds. Using one-way
delays, asymmetries in the network are revealed that normal round-trip
time delays do not.
3.2 End-to-End Performance
NLANR Engineering Services and the NLANR Distributed Applications Support Team provide
tools for analyzing the end-to-end network performance along particular
network paths. Some tools are intended for use by network engineers and
some are more appropriate for use by researchers. The tools range from
adjusting parameters of the TCP protocol to investigating difficiencies
in routers and the operating system's TCP implementation.
For more assistance with optimizing a campus network, or finding the
bottlenecks along particular paths, contact NLANR
Engineering Services team. For help with a research project, contact
the NLANR Distributed Applications Support
Team.
Treno "TCP Reno"
Treno emulates
the TCP protocol stack using UDP (user datagram protocol). You can use
this to compare an operating system's TCP implementation with a modern
TCP implementation that includes such improvements as SACK (selective acknowledgement),
FACK (forward acknowledgement), and Path MTU Discovery. Treno also allows
targeting individual routers along the path, to discover what links are
problematic. Not for the faint-of-heart, but very valuable to network engineers.
mping
mping
stresses the network, intentionally flooding the router queues to test
queuing properties. Using
mping,
you can find, for example, the bandwidth and packet loss as the TCP window
size increases. Again, this tool is intended for network engineers skilled
at interpreting the output.
Iperf
Iperf measures the
maximum TCP bandwidth and the UDP performance between two machines. It
is useful to tune the TCP window size and gives a baseline to compare application
performance against. It can also measure packet loss and variation in delay
(jitter). Many older but similar tools such as ttcp also exist. Iperf is
a useful tool for non-network engineers.
traceroute, ping, mtr
-
traceroute
-
is used to find the path your data takes through the network. This can
be important to verify that you are using a high-performance network. (See
Chapter 2 for additional information.) traceroute also provides
round-trip time measurements to each router.
-
ping
-
will repetitively find round-trip time measurements to a particular machine
or router. The repetition helps to reveal changes in delay. Both ping and
traceroute are standard UNIX commands, often found in /sbin, /usr/sbin,
/etc, or /usr/etc.
-
mtr
-
("Matt's TraceRoute") is a program that combines the functionality of traceroute
and ping and presents the output data in an easy-to-read tabular
format. It repetitively pings each router along the path, showing delay
and packet loss.
Of course, issuing repetitive ping commands should only be done
in the course of investigating specific questions you have about a connection
and only for a short period of time (e.g., one or two minutes).
tcpdump, tcptrace, xplot
-
tcpdump
-
is a standard UNIX utility to examine or "sniff" the traffic on the network.
It is similar in functionality to the OCXmon passive monitoring machines,
but is typically run to only look at a specific connection.
-
tcptrace
-
can be used to analyze the output from tcpdump, and xplot will
show the packets graphically. This is very helpful (for those familiar
with TCP's internal workings) to diagnose network problems.
-
xplot
-
is helping to reveal "pathological" network behaviors. Staff at the Pittsburgh
Supercomputing Center are developing a debugging
flowchart using xplot to show these behaviors.
All these tools are available from NLANR Engineering
Services as a preconfigured
testrig, which means that the software is prebuilt and preconfigured.
4. Application Performance
4.1 Instrumenting Applications for Good Network
Performance
Measuring performance in an application involves instrumenting the application.
At the very basic level, instrumenting means keeping track of how
long a transfer takes and computing bandwidth from that. For instance,
FTP clients typically report the size, duration, and average bandwidth
after each file transfer. Real-time applications often keep track of delay,
variations in delay, and packet loss.
Application performance instrumentation can be used both for tuning
an application and for responding dynamically to network conditions. For
instance, a video conference application may notice high packet loss and
reduce the bandwidth it is using, or notice large variations in delay and
increase the amount of buffering.
The manner in which an application adjusts to the network conditions
is very application specific. In a video-conferencing software it may involve
manually adjusting the sending rate if loss is too high. In FTP, you might
pick a different FTP server. The Internet
2 Distributed Storage Infrastructure (I2-DSI) service uses heuristics
to automatically pick the server that will be best for you. In audio conferencing,
one might give up and use the telephone. (Seriously, we've done this before!)
Netlog
Netlog is a library
for instrumenting socket networking in C/C++ applications. It logs periodic
bandwidth results, giving a fine-grained picture of how an application
interacts with the network. It requires minimal source code changes, usually
just adding a header file. Data can be logged to a file or over a network
connection, and analyzed by Viznet. A Java version is also being developed.
Check the NLANR Distributed
Applications Tools page and the
NLANR Distributed Applications
Projects page for release information.
Viznet
Viznet, which
is implemented in Java, visualizes the raw performance data that Netlog
provides. It draws a graph of bandwidth vs. time for each socket connection
and allows interactive scaling and scrolling of the graph. Viznet can read
recorded Netlog files or live Netlog data over the network.
Real-time Transport Protocol
Real-time transport protocol (RTP) is used by many real-time applications
(e.g., video conferencing) to monitor and respond to network conditions.
It detects delay, delay jitter, and packet loss. RTP uses RTCP (RTP control
protocol) to give performance feedback to the sending application. A good
overview
of RTP is given by Henning Schulzrinne, a professor at Columbia University.
4.2 Measuring Application Performance
After instrumenting an application, you can start to measure the network
performance it gets. From there you can tune the application or followup
with network engineers to fix problems evident in the network. Follow these
guidelines to determine your application's performance.
-
Start measuring application performance in a local, controlled environment.
The local network is much simpler to debug, because there are fewer variables
and only one network engineer to contact if performance is poor. Do some
initial tuning at this point (see next section immediately below) to give
a good baseline to compare against later. Remember that an application
that does not perform well locally will not perform well over the wide-area.
-
Once the performance is good in the local network, move on to measuring
performance in the wide-area network. Use Iperf
to measure the achievable bandwidth or other performance characteristic
your application requires. Run your application and compare its performance
to the performance in the local network and also to the performance you
get using Iperf. Tune the application and compare results with earlier
results.
4.3 Tuning TCP Application Performance
The TCP window size is by far the most important parameter to adjust for
achieving maximum bandwidth across high-performance networks. Properly
setting the TCP window size can often more than double the achieved bandwidth.
See the
User's
Guide to TCP Windows for details.
Another parameter to check is the MTU (maximum transmission unit). MTU
controls what the largest packet size can be. If the operating system supports
Path MTU Discovery, the largest packet size will be the largest MTU all
of the intermediate networks support (often Ethernet's 1,500 byte MTU).
Without Path MTU Discovery, it will often be 576 bytes. A small MTU wastes
time processing many small packets instead of fewer large ones. The system
administrator can enable Path MTU Discovery if the operating system implements
it. Information available on the Pittsburgh Supercomuting Center's website
on enabling high-performance
data transfers on hosts lists which operating systems support Path
MTU Discovery and how to enable it.
Use Iperf with the -m (print MSS) option to check the MTU.
Bellow shows a host that doesn't support Path MTU Discovery. It will only
send and receive small 576 byte packets.
oceana> iperf -s -m
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (default)
------------------------------------------------------------
[ 4] local 172.20.40.200 port 5001 connected with 172.16.7.4 port 13914
[ ID] Interval Transfer
Bandwidth
[ 4] 0.0- 2.3 sec 632 KBytes 2.1 Mbits/sec
WARNING: Path MTU Discovery may not be enabled.
[ 4] MSS size 536 bytes (MTU 576 bytes, minimum)
[ 4] Read lengths occurring in more than 5% of reads:
[ 4] 536 bytes read 308 times (58.4%)
[ 4] 1072 bytes read 91 times (17.3%)
[ 4] 1608 bytes read 29 times (5.5%)
For applications that intentionally send small chunks of data immediately,
setting the TCP no delay option may improve performance. Interactive
applications, such as telnet, often fall into this category. Normally TCP
queues small writes to send out larger packets. This queueing can sometimes
be undesirable. Imagine if you had to type 1000 characters in telnet before
a single one was displayed! To set TCP no delay, use setsockopt:
int nodelay = 1;
int error = setsockopt( socket, IPPROTO_TCP, TCP_NODELAY, &nodelay,
sizeof(nodelay));
Note the TCP no delay option does not apply to UDP-based applications,
which always send data immediately.
4.4 Tuning UDP Application Performance
For UDP applications, the burstiness of the traffic can be an issue.
An application may send a large burst of packets back-to-back, followed
by some idle time. The average bandwidth looks reasonable, but the bandwidth
during the burst is excessive. Burstiness is the result of two different
parameters: how much time occurs between writes and how large individual
datagram writes are.
If there is very little delay between writes, the application may create
a burst causing too much stress on the network. Spacing out writes -- by
putting in a sleep for instance -- reduces this stress and thus reduces
packet loss. Instrumenting your application with Netlog can show the timing
of your writes, as can looking at a packet trace from tcpdump.
If the datagram size is large, it will be broken up into separate packets
which will then be sent as a single burst. This is actually worse than
not spacing out writes, because if a single packet of the datagram is lost,
the entire datagram must be discarded. So not only is there a burst that
causes network stress, but the effects of the resulting packet loss are
also magnified. Delay and jitter are also increased, because more time
is spent fragmenting the datagram into separate packets and reassembling
it again.
Iperf can be used to simulate streams with different datagram write
sizes. Here is an example of sending 1470 byte datagrams (each 1 packet)
and then 32 KB datagrams (each 23 packets). Notice the much higher delay
jitter (third column from the right) and percent packet loss (right column)
caused by using large datagrams.
cyclops> iperf -s -u -i 1 -l 1470
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 60.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.20.178.168 port 5001 connected with 172.16.7.4 port 9571
[ ID] Interval Transfer
Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0- 1.0 sec 4.4 MBytes 35.2 Mbits/sec 0.137
ms 0/ 3135 (0%)
[ 3] 1.0- 2.0 sec 4.4 MBytes 34.9 Mbits/sec 0.213
ms 0/ 3117 (0%)
[ 3] 2.0- 3.0 sec 4.4 MBytes 34.9 Mbits/sec 0.429
ms 1/ 3123 (0.032%)
[ 3] 3.0- 4.0 sec 4.4 MBytes 35.2 Mbits/sec 0.150
ms 2/ 3136 (0.064%)
[ 3] 4.0- 5.0 sec 4.4 MBytes 35.0 Mbits/sec 0.139
ms 0/ 3122 (0%)
[ 3] 5.0- 6.0 sec 4.4 MBytes 35.0 Mbits/sec 0.178
ms 0/ 3125 (0%)
[ 3] 6.0- 7.0 sec 4.4 MBytes 35.1 Mbits/sec 0.185
ms 0/ 3128 (0%)
[ 3] 7.0- 8.0 sec 4.4 MBytes 35.0 Mbits/sec 0.248
ms 1/ 3121 (0.032%)
[ 3] 8.0- 9.0 sec 4.4 MBytes 35.1 Mbits/sec 0.141
ms 0/ 3131 (0%)
[ 3] 0.0-10.0 sec 43.8 MBytes 35.1 Mbits/sec 0.119
ms 4/31252 (0.013%)
cyclops> iperf -s -u -i 1 -l 32k
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 32768 byte datagrams
UDP buffer size: 60.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.20.178.168 port 5001 connected with 172.16.7.4 port 9534
[ ID] Interval Transfer
Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0- 1.0 sec 4.3 MBytes 34.4 Mbits/sec 1.036
ms 3/ 141 (2.1%)
[ 3] 1.0- 2.0 sec 4.3 MBytes 34.7 Mbits/sec 0.847
ms 1/ 140 (0.71%)
[ 3] 2.0- 3.0 sec 4.3 MBytes 34.8 Mbits/sec 1.093
ms 1/ 140 (0.71%)
[ 3] 3.0- 4.0 sec 4.3 MBytes 34.7 Mbits/sec 0.688
ms 1/ 140 (0.71%)
[ 3] 4.0- 5.0 sec 4.3 MBytes 34.2 Mbits/sec 0.479
ms 3/ 140 (2.1%)
[ 3] 5.0- 6.0 sec 4.3 MBytes 34.8 Mbits/sec 0.384
ms 1/ 140 (0.71%)
[ 3] 6.0- 7.0 sec 4.4 MBytes 35.0 Mbits/sec 0.939
ms 0/ 140 (0%)
[ 3] 7.0- 8.0 sec 4.4 MBytes 35.0 Mbits/sec 0.691
ms 0/ 140 (0%)
[ 3] 8.0- 9.0 sec 4.3 MBytes 34.5 Mbits/sec 0.871
ms 2/ 140 (1.4%)
[ 3] 9.0-10.0 sec 4.3 MBytes 34.6 Mbits/sec 0.796
ms 1/ 140 (0.71%)
[ 3] 0.0-10.0 sec 43.4 MBytes 34.7 Mbits/sec 0.988
ms 13/ 1402 (0.93%)
4.5 Followup
Where do you need to followup if your application performance is poor?
-
Start by checking the application
performance sections above. Check also with the local network administrator
to verify that the campus network is optimized. Sometimes the campus is
connected to a high-performance network such as Abilene, but your desktop
machine is not! The local network administrator can verify that you are
using the high-performance network and that the campus network infrastructure
is sufficient to meet your needs. They may also discover that the problem
is in a regional gigaPoP, so they should contact the gigaPoP operators
to solve problems there.
-
For further assistance with measuring and improving the performance of
an application,
contact
the NLANR Distributed Applications Support Team who can work with your application
to tune it and the underlying networks to meet your needs.
5. Future Directions
5.1 Grid Services
The Network Weather Service (NWS) and
Globus
Gloperf run servers on various supercomputers that test bandwidth and
latency between machines. This information is published with the Globus
MDS (metacomputing directory service). This provides an easy way to find
up-to-date information about bandwidth and latency without having to run
a test of your own. The NWS provides a prediction model, so it can give
an estimate of available bandwidth based on past performance. It also measures
other performance factors such as CPU load and free memory.
5.2 User Portals
The Alliance is working on user portals, web pages that contain
all the commonly accessed information customized for an individual user.
A portion of a user portal is expected to be network performance information,
so all the measurements you need on a day-to-day basis are at your fingertips.
Bandwidth and delay tests from your machine to remote sites can be performed,
as well as querying for bandwidth and latency between various supercomputers.
5.3 Automatic Tuning
The NLANR teams and the Web100 group are discussing automated tuning of
application performance. It is possible to have the operating system figure
out the optimal TCP window size and set it, without requiring you to manually
tune an application. NLANR is working on a version of FTP that automatically
tunes itself for each connection. Check the Tools
page and the NLANR distributed applications Projects page for release information.
|