Go to the top of the NLANR/DAST web site

AAD | Advisor | Autobuf v2.0 | Multicast Beacon | BIMA | Iperf | NextINet | Tools | Web100 | All Projects


Search this site with Google

About:
- DAST
- NLANR
- FAQ
- Staff
- Contact DAST

End User Tools and Projects
- NextINet
- Advanced Applications
Database

- DAST Projects/Tools
- Network Performance
and Measurement Tools

End User Support
- Getting Started Guide
- Networking Glossary
- Other Projects/Organizations
- Funding Opportunities

Documents
- Guides/Tutorials
- Papers/Articles
- Presentations
- Reference Books

WebCT Courses
- Tuning Applications

Events
- NLANR/DAST Training
- NLANR Packets Calendar
- Idesk Travel Schedule

News
- Press Releases
- Alliance Data Link
- I2 Newswire Archives

Reports & Statistics
- Monthly Updates and QSRs
- Abilene "Weather Map"
- Web Server Stats

A User's Guide to TCP Windows



What is the TCP Window Size?

The TCP window size is by far the most important parameter to adjust for achieving maximum bandwidth across high-performance networks. Properly setting the TCP window size can often more than double the achieved bandwidth.

Technically, the TCP window size is the maximum amount of data that can be in the network at any time for a single connection. (It is the upper limit of the TCP congestion window.)

Think of a water hose. To achieve maximum water flow, the hose should be full. As the hose increases in diameter and length, the volume of water to keep it full increases. In networks, diameter equates to bandwidth, length is measured as round-trip time, and the TCP window size is analogous to the volume of water necessary to keep the hose full. On fast networks with large round-trip times, the TCP window size must be increased to achieve maximum TCP bandwidth.

Computing the TCP Window Size

Theoretically the TCP window size should be set to the bandwidth delay product, which computes the volume of data that can be in the network between two machines. The bandwidth delay product is:

bottleneck bandwidth * round-trip time

To compute the bandwidth delay product for a pair of hosts, first estimate what the slowest link between them is. Often this is the 100 Mbit/sec ethernet the machine is connected to, or the 45 Mbit/sec DS3 link from the campus to the wide-area. Then use ping to find the round-trip time. For example, if the slowest link is a 45 Mbit/sec DS3 link, and the round-trip time is 30 milliseconds:

45 Mbit/sec * 30 ms
= 45e6 * 30e-3
= 1,350,000 bits / 8 / 1024
= 165 KBytes

Setting the TCP Window Size

The TCP window size can be set on a per connection basis, as detailed below. For setting the default TCP window size on a host, and other important factors to high-performance networking, see PSC's page on Enabling High Performance Data Transfers on Hosts.

Most OSes and hosts have upper limits on the TCP window size. These may be as low as 64 KB, or as high as several MB. To enable TCP window sizes larger than 64 KB, TCP large window extensions (RFC 1323) must be enabled. See PSC's page above for OSes that implement it.

Since TCP is a reliable transport, if any data is lost in transmission, TCP must be able to retransmit it. Thus TCP remembers all the sent data in a buffer until the other side acknowledges receiving it. The size of this buffer is the TCP window size.

The TCP window size is implemented by send and receive buffers on each end of the connection. To set these buffers, use the SO_SNDBUF and SO_RCVBUF socket options. Both ends of the connection must set these options. For example:

int window = 128 * 1024; // example, 128 KB
int error = setsockopt( socket, SOL_SOCKET, SO_SNDBUF, &window, sizeof(window));
    error = setsockopt( socket, SOL_SOCKET, SO_RCVBUF, &window, sizeof(window));

This must occur before the listen() or connect() call for windows larger than 64 KB to be effective. Please see the sample code for a complete implementation with proper error checking and support for various different operating systems. UNICOS and AIX have special code.

Testing Bandwidth

Here is a simple example of testing the network bandwidth with several different TCP window sizes. First start the Iperf server on one machine (here, cyclops), then start the client on another machine (modi4).

Using the system default 60 KByte TCP window size:

cyclops> iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 60.0 KByte (default)
------------------------------------------------------------
[  4] local 172.31.178.168 port 5001 connected with 172.16.7.4 port 2357
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec   6.5 MBytes   5.2 Mbits/sec

modi4> iperf -c cyclops
------------------------------------------------------------
Client connecting to cyclops, TCP port 5001
TCP window size: 59.9 KByte (default)
------------------------------------------------------------
[  3] local 172.16.7.4 port 2357 connected with 172.31.178.168 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   6.5 MBytes   5.2 Mbits/sec

Setting the TCP window size to 130 KBytes. Note the increase in bandwidth from 5.2 to 15.7 Mbits/sec.

cyclops> iperf -s -w 130k
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  130 KByte
------------------------------------------------------------
[  4] local 172.31.178.168 port 5001 connected with 172.16.7.4 port 2530
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec  19.7 MBytes  15.7 Mbits/sec

modi4> iperf -c cyclops -w 130k
------------------------------------------------------------
Client connecting to cyclops, TCP port 5001
TCP window size:  129 KByte (WARNING: requested  130 KByte)
------------------------------------------------------------
[  3] local 172.16.7.4 port 2530 connected with 172.31.178.168 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  19.7 MBytes  15.8 Mbits/sec

The Iperf documentation has many more examples of testing the bandwidth. You should also test bandwidth in your application, since it will behave differently than Iperf. For instance, FTP must read its data from disk, which slows it down substantially.

Adjusting the TCP window size

While the bandwidth delay product gives the theoretical value for the TCP window size, that is not always the best value. Problems come because the OS's TCP implementation has bugs and/or the network has deficiencies. Usually try values 10% above and below the calculated TCP window size. If one of those is better, try values above and below that, repeating until the maximum bandwidth is reached. Remember there will be some variability in bandwidth due to other competing network traffic. In some cases, OS and network problems may be so bad that deliberately setting the TCP window size low will increase performance because it masks the other problems. Talk to your network engineers if that is the case.

Over time, network topology and routing changes, which will cause changes in the bandwidth delay product. For instance, the connection between cyclops and modi4 above changed from using the vBNS to using the Abilene network, causing an increase in delay of about 10 ms. Therefore, you should periodically test the TCP window size to see if you are still getting maximum performance.


Acknowledgments

This page grew out of an earlier page written by Von Welch at NCSA.


Last modified: Mon Jan 24 13:26:12 CST 2000

Contact DASTBlank Space Last reviewed: December 31, 1969
NLANR || Applications Support || Engineering Support || Measurement and Network Analysis