This white paper describes the technology and implementation considerations
when working with the network teaming services offered by the Broadcom software
shipped with Dell�s servers and storage products. The goal of Broadcom teaming
services is to provide fault tolerance and link aggregation across a team of
two or more adapters. The information in this document is provided to assist
IT professionals during the deployment and troubleshooting of server applications
that require network fault tolerance and load balancing.
Glossary
Table 1: Glossary
Item
Definition
BACS
Broadcom Advanced Control Suite
BASP
Broadcom Advanced Server Program (intermediate driver)
Smart Load Balancing™ and Failover
Switch-independent failover type of team in which the primary team member
handles all incoming and outgoing traffic while the standby team member
is idle until a failover event (for example, loss of link occurs). The intermediate
driver (BASP) manages incoming/outgoing traffic.
Smart Load Balancing (SLB)
Switch-independent load balancing and failover type of team, in which
the intermediate driver manages outgoing/incoming traffic.
LACP
Link Aggregation Control Protocol
Generic Trunking (FEC/GEC)/802.3ad-Draft Static
Switch-dependent load balancing and failover type of team in which the
intermediate driver manages outgoing traffic and the switch manages incoming
traffic.
Link Aggregation (802.3ad)
Switch-dependent load balancing and failover type of team with LACP in
which the intermediate driver manages outgoing traffic and the switch manages
incoming traffic.
The concept of grouping multiple physical devices to provide fault tolerance
and load balancing is not new. It has been around for years. Storage devices
use RAID technology to group individual hard drives. Switch ports can be grouped
together using technologies such as Cisco Gigabit EtherChannel, IEEE 802.3ad
Link Aggregation, Bay Network Multilink Trunking, and Extreme Network Load Sharing.
Network interfaces on Dell servers can be grouped together into a team of physical
ports called a virtual adapter.
Network Addressing
To understand how teaming works, it is important to understand how node communications
work in an Ethernet network. This document is based on the assumption that the
reader is familiar with the basics of IP and Ethernet network communications.
The following information provides a high-level overview of the concepts of
network addressing used in an Ethernet network.
Every Ethernet network interface in a host platform such as a server requires
a globally unique Layer 2 address and at least one globally unique Layer 3 address.
Layer 2 is the Data Link Layer, and Layer 3 is the Network layer as defined in
the OSI model. The Layer 2 address is assigned to the hardware and is often referred
to as the MAC address or physical address. This address is pre-programmed at the
factory and stored in NVRAM on a network interface card or on the system motherboard
for an embedded LAN interface. The layer 3 addresses are referred to as the protocol
or logical address assigned to the software stack. IP and IPX are examples of
Layer 3 protocols. In addition, Layer 4 (Transport Layer) uses port numbers for
each network upper level protocol such as Telnet or FTP. These port numbers are
used to differentiate traffic flows across applications. Layer 4 protocols such
as TCP or UDP are most commonly used in today�s networks. The combination of the
IP address and the TCP port number is called a socket.
Ethernet devices communicate with other Ethernet devices using the MAC address,
not the IP address. However, most applications work with a host name that is
translated to an IP address by a Naming Service such as WINS and DNS. Therefore,
a method of identifying the MAC address assigned to the IP address is required.
The Address Resolution Protocol for an IP network provides this mechanism. For
IPX, the MAC address is part of the network address and ARP is not required.
ARP is implemented using an ARP Request and ARP Reply frame. ARP Requests are
typically sent to a broadcast address while the ARP Reply is typically sent
as unicast traffic. A unicast address corresponds to a single MAC address or
a single IP address. A broadcast address is sent to all devices on a network.
Teaming and Network Addresses
A team of adapters function as a single virtual network interface and does
not appear any different to other network devices than a non-teamed adapter.
A virtual network adapter advertises a single layer 2 and one or more layer
3 addresses. When the teaming driver initializes, it selects one MAC address
from one of the physical adapters that make up the team to be the Team MAC address.
This address is typically taken from the first adapter that gets initialized
by the driver. When the server hosting the team receives an ARP Request, it
will select one MAC address from among the physical adapters in the team to
use as the source MAC address in the ARP Reply.In Windows operating systems,
the IPCONFIG /all command shows the IP and MAC address of the virtual adapter
and not the individual physical adapters. The protocol IP address is assigned
to the virtual network interface and not to the individual physical adapters.
For switch independent teaming modes, all physical adapters that make up a
virtual adapter must use the unique MAC address assigned to them when transmitting
data. That is, the frames that are sent by each of the physical adapters in
the team must use a unique MAC address to be IEEE compliant.It is important
to note that ARP cache entries are not learned from received frames, but only
from ARP Requests and ARP Replies.
There are three methods for classifying the supported teaming types: one is
based on whether the switch port configuration must also match the adapter teaming
type; the second is based on the functionality of the team, whether it supports
load balancing and failover or just failover; and the third is based on whether
the Link Aggregation Control Protocol is used or not. The following table shows
a summary of the teaming types and their classification.
Table 2: Available Teaming Types
Teaming Type
Switch-Dependent
(Switch must support specific type of team)
Link Aggregation Control Protocol Support Required on the Switch
Load Balancing
Failover
Smart Load Balancing and Failover (SLB) (with 2 to 8 load balance team
members)
•
•
SLB (Auto-Fallback Disable)
•
Link Aggregation (802.3ad)
•
•
•
•
Generic Trunking (FEC/GEC)/802.3ad-Draft Static
•
•
•
Smart Load Balancing (SLB)
Smart Load Balancing™ provides both load balancing and failover when
configured for Load Balancing, and only failover when configured for fault tolerance.
It works with any Ethernet switch and requires no trunking configuration on
the switch. The team advertises multiple MAC addresses and one or more IP addresses
(when using secondary IP addresses). The team MAC address is selected from the
list of load balancing members. When the server receives an ARP Request, the
software-networking stack will always send an ARP Reply with the team MAC address.
To begin the load balancing process, the teaming driver will modify this ARP
Reply by changing the source MAC address to match one of the physical adapters.
Smart Load Balancing enables both transmit and receive load balancing based
on the Layer 3/Layer 4 IP address and TCP/UDP port number. In other words, the
load balancing is not done at a byte or frame level but on a TCP/UDP session
basis. This methodology is required to maintain in-order delivery of frames
that belong to the same socket conversation. Load balancing is supported on
2-8 ports. These ports can include any combination of add-in adapters and LAN-on-Motherboard
(LOM) devices. Transmit load balancing is achieved by creating a hashing table
using the source and destination IP addresses and TCP/UDP port numbers.The same
combination of source and destination IP addresses and TCP/UDP port numbers
will generally yield the same hash index and therefore point to the same port
in the team. When a port is selected to carry all the frames of a given socket,
the unique MAC address of the physical adapter is included in the frame, and
not the team MAC address. This is required to comply with the IEEE 802.3 standard.
If two adapters transmit using the same MAC address, then a duplicate MAC address
situation would occur that the switch could not handle.
Receive Load Balancing is achieved through an intermediate driver by sending
Gratuitous ARPs on a client by client basis using the unicast address of each
client as the destination address of the ARP Request (also known as a Directed
ARP). This is considered client load balancing and not traffic load balancing.
When the intermediate driver detects a significant load imbalance between the
physical adapters in an SLB team, it will generate G-ARPs in an effort to redistribute
incoming frames. The intermediate driver (BASP) does not answer ARP Requests;
only the software protocol stack provides the required ARP Reply. It is important
to understand that receive load balancing is a function of the number of clients
that are connecting to the server via the team interface.
SLB Receive Load Balancing attempts to load balance incoming traffic for client
machines across physical ports in the team. It uses a modified Gratuitous ARP
to advertise a different MAC address for the team IP Address in the sender physical
and protocol address. This G-ARP is unicast with the MAC and IP Address of a
client machine in the target physical and protocol address respectively. This
causes the target client to update its ARP cache with a new MAC address map
to the team IP address. G-ARPs are not broadcast because this would cause all
clients to send their traffic to the same port. As a result, the benefits achieved
through client load balancing would be eliminated, and could cause out of order
frame delivery. This receive load balancing scheme works as long as all clients
and the teamed server are on the same subnet or broadcast domain.
When the clients and the server are on different subnets, and incoming traffic
has to traverse a router, the received traffic destined for the server is not
load balanced. The physical adapter that the intermediate driver has selected
to carry the IP flow will carry all of the traffic. When the router needs to
send a frame to the team IP address, it will broadcast an ARP Request (if not
in the ARP cache). The server software stack will generate an ARP Reply with
the team MAC address, but the intermediate driver will modify the ARP Reply
and send it over a particular physical adapter, establishing the flow for that
session.
The reason is that ARP is not a routable protocol. It does not have an IP header
and therefore is not sent to the router or default gateway. ARP is only a local
subnet protocol. In addition, since the G-ARP is not a broadcast packet, the
router will not process it and will not update its own ARP cache.
The only way that the router would process an ARP that is intended for another
network device is if it has Proxy ARP enabled and the host has no default gateway.
This is very rare and not recommended for most applications.
Transmit traffic through a router will be load balanced as transmit load balancing
is based on the source and destination IP address and TCP/UDP port number. Since
routers do not alter the source and destination IP address, the load balancing
algorithm works as intended.
Configuring routers for Hot Standby Routing Protocol (HSRP) does not allow
for receive load balancing to occur in the adapter team. In general, HSRP allows
for two routers to act as one router, advertising a virtual IP and virtual MAC
address. One physical router is the active interface while the other is standby.
Although HSRP can also load share nodes (using different default gateways on
the host nodes) across multiple routers in HSRP groups, it always points to
the primary MAC address of the team.
Generic Trunking
Generic Trunking is a switch-assisted teaming mode and requires configuring
ports at both ends of the link: server interfaces and switch ports. This is
often referred to as Cisco Fast EtherChannel or Gigabit EtherChannel. In addition,
generic trunking supports similar implementations by other switch OEMs such
as Extreme Networks Load Sharing and Bay Networks or IEEE 802.3ad Link Aggregation
static mode. In this mode, the team advertises one MAC Address and one IP Address
when the protocol stack responds to ARP Requests. In addition, each physical
adapter in the team uses the same team MAC address when transmitting frames.
This is possible since the switch at the other end of the link is aware of the
teaming mode and will handle the use of a single MAC address by every port in
the team. The forwarding table in the switch will reflect the trunk as a single
virtual port.
In this teaming mode, the intermediate driver controls load balancing and failover
for outgoing traffic only, while incoming traffic is controlled by the switch
firmware and hardware. As is the case for Smart Load Balancing, the BASP intermediate
driver uses the IP/TCP/UDP source and destination addresses to load balance
the transmit traffic from the server. Most switches implement an XOR hashing
of the source and destination MAC address.
Link Aggregation is similar to Generic Trunking except that it uses the Link
Aggregation Control Protocol to negotiate the ports that will make up the team.
LACP must be enabled at both ends of the link for the team to be operational.
If LACP is not available at both ends of the link, 802.3ad provides a manual
aggregation that only requires both ends of the link to be in a link up state.
Because manual aggregation provides for the activation of a member link without
performing the LACP message exchanges, it should not be considered as reliable
and robust as an LACP negotiated link. LACP automatically determines which member
links can be aggregated and then aggregates them. It provides for the controlled
addition and removal of physical links for the link aggregation so that no frames
are lost or duplicated. The removal of aggregate link members is provided by
the marker protocol that can be optionally enabled for Link Aggregation Control
Protocol (LACP) enabled aggregate links.
The Link Aggregation group advertises a single MAC address for all the ports
in the trunk. The MAC address of the Aggregator can be the MAC addresses of
one of the MACs that make up the group. LACP and marker protocols use a multicast
destination address.
The Link Aggregation control function determines which links may be aggregated
and then binds the ports to an Aggregator function in the system and monitors
conditions to determine if a change in the aggregation group is required. Link
aggregation combines the individual capacity of multiple links to form a high
performance virtual link. The failure or replacement of a link in an LACP trunk
will not cause loss of connectivity. The traffic will simply be failed over
to the remaining links in the trunk.
Teaming is implemented via an NDIS intermediate driver in the Windows Operating
System environment. This software component works with the miniport driver,
the NDIS layer, and the protocol stack to enable the teaming architecture (see
Figure 3
). The miniport driver controls the host LAN controller directly to enable functions
such as sends, receives, and interrupt processing. The intermediate driver fits
between the miniport driver and the protocol layer multiplexing several miniport
driver instances, and creating a virtual adapter that looks like a single adapter
to the NDIS layer. NDIS provides a set of library functions to enable the communications
between either miniport drivers or intermediate drivers and the protocol stack.
The protocol stack implements IP, IPX and ARP. A protocol address such as an
IP address is assigned to each miniport device instance, but when an Intermediate
driver is installed, the protocol address is assigned to the virtual team adapter
and not to the individual miniport devices that make up the team.
The Broadcom supplied teaming support is provided by three individual software
components that work together and are supported as a package.When one component
is upgraded, all the other components must be upgraded to the supported versions.The
following table describes the three software components and their associated
files for supported operating systems.
The various teaming modes described in this document place
certain restrictions on the networking equipment used to connect clients to
teamed servers. Each type of network interconnect technology has an effect on
teaming as described below.
A Repeater Hub allows a network administrator to extend an Ethernet network
beyond the limits of an individual segment. The repeater regenerates the input
signal received on one port onto all other connected ports, forming a single
collision domain. This means that when a station attached to a repeater sends
an Ethernet frame to another station, every station within the same collision
domain will also receive that message. If two stations begin transmitting at
the same time, a collision will occur, and each transmitting station will need
to retransmit its data after waiting a random amount of time.
The use of a repeater requires that each station participating within the collision
domain operate in half-duplex mode. Though half-duplex mode is supported for
Gigabit Ethernet devices in the IEEE 802.3 specification, it is not supported
by the majority of Gigabit Ethernet controller manufacturers and will not be
considered here.
Teaming across hubs is supported for troubleshooting purposes (such as connecting
a network analyzer) for SLB teams only.
Unlike a repeater hub, a switching hub (or more simply a switch) allows an
Ethernet network to be broken into multiple collision domains. The switch is
responsible for forwarding Ethernet packets between hosts based solely on Ethernet
MAC addresses. A physical network adapter that is attached to a switch may operate
in half-duplex or full-duplex mode.
To support Generic Trunking and 802.3ad Link Aggregation, a switch must specifically
support such functionality. If the switch does not support these protocols,
it may still be used for Smart Load Balancing.
Router
A router is designed to route network traffic based on Layer 3 or higher protocols,
although it will often also work as a Layer 2 device with switching capabilities.
Teaming ports connected directly to a router is not supported.
All teaming modes are supported for the IA-32 server operating systems as shown
in Table 4.
Table 4: Teaming Support by
Operating System
Teaming Mode
Windows
Linux
NetWare
Smart Load Balancing and Failover
•
•
•
Generic Trunking
•
•
•
Link Aggregation
•
•
•
Utilities for Configuring Teaming by Operating System
Table 5 lists the tools used to configure teaming in the supported operating
system environments.
Table 5: Operating System Configuration
Tools
Operating System
Configuration Utility
Windows 2000
BACS2
Windows Server 2003
BACS2
NetWare
Autoexec.ncf and Basp.lan
Linux
Baspcfg
The Broadcom Advanced Control Suite (BACS) (see Figure
1) is designed to run in one of the following 32-bit Windows operating systems:
Microsoft� Windows� 2000 and Windows Server 2003. BACS is used to configure
load balancing and fault tolerance teaming, and VLANs. In addition, it displays
the MAC address, driver version, and status information. The BACS also includes
a number of diagnostics tools such as hardware diagnostics, cable testing, and
a network topology test.
Figure 1: Broadcom Advanced
Control Suite 2
When an adapter configuration is saved in NetWare, the NetWare install program
adds load and bind statements to the Autoexec.ncf file. By accessing this file,
you can verify the parameters configured for each adapter, add or delete parameters,
or modify parameters.
BASP Configuration (baspcfg) is a command line tool for Linux to configure
the BASP teams, add/remove adapters, and add/remove virtual devices. This tool
can be used in custom initialization scripts. Refer to your distribution-specific
documentation for more information on your distributor's startup procedures.
Supported Features by Team
Type
Table 6 provides a feature comparison across the teaming
types supported by Dell. Use this table to determine the best type of team for
your application. The teaming software supports up to 8 ports in a single team
and up to 4 teams in a single system. The 4 teams can be any combination of
the supported teaming types, but each team must be on a separate network or
subnet.
Table 6: Comparison of Teaming
Modes
Type of Team
Fault Tolerance
Load Balancing
Switch-Dependent Static Trunking
Switch-Independent
Dynamic Link Aggregation
(IEEE 802.3ad)
Function
SLB with Standbya
SLB
Generic Trunking
Link Aggregation
Number of ports per team (same broadcast domain)
2–8
2–8
2–8
2–8
Number of teams
4
4
4
4
Adapter fault tolerance
Yes
Yes
Yes
Yes
Switch link fault tolerance (same broadcast domain)
Yes
Yes
Switch-dependent
Switch-dependent
TX load balancing
No
Yes
Yes
Yes
RX load balancing
No
Yes
Yes (performed by the switch)
Yes (performed by the switch)
Requires compatible switch
No
No
Yes
Yes
Heartbeats to check connectivity
No
No
No
No
Mixed media (adapters with different media)
Yes
Yes
Yes (switch-dependent)
Mixed speeds (adapters that do not support a common speed(s), but can
operate at different speeds)
Yes
Yes
No
No
Mixed speeds (adapters that support a common speed(s), but can operate
at different speeds)
Yes
Yes
No (must be the same speed)
Yes
Load balances TCP/IP
No
Yes
Yes
Yes
Mixed vendor teaming
Yesb
Yesb
Yesb
Yesb
Load balances non-IP
No
Yes (IPX outbound traffic only)
Yes
Yes
Same MAC address for all team members
No
No
Yes
No
Same IP address for all team members
Yes
Yes
Yes
Yes
Load balancing by IP address
No
Yes
Yes
Yes
Load balancing by MAC address
No
Yes (used for no-IP/IPX)
Yes
Yes
a SLB with one primary and one standby member. b Requires at least one Broadcom adapter in the team.
The following flow chart provides the decision flow when planning for teaming.
The primary rationale for teaming is the need for additional network bandwidth
and fault tolerance. Teaming offers link aggregation and fault tolerance to
meet both of these requirements. Preference teaming should be selected in the
following order: IEEE 802.3ad as the first choice, Generic Trunking as the second
choice, and SLB teaming as the third choice when using unmanaged switches or
switches that do not support the first two options. if switch fault tolerance
is a requirement, however, then SLB is the only choice (see Figure 2).
The Broadcom Advanced Server Program is implemented as an NDIS intermediate
driver (see Figure
3). It operates below protocol stacks such as TCP/IP and IPX and
appears as a virtual adapter. This virtual adapter inherits the MAC Address
of the first port initialized in the team. A Layer 3 address must also be configured
for the virtual adapter. The primary function of BASP is to balance inbound
(for SLB) and outbound traffic (for all teaming modes) among the physical adapters
installed on the system selected for teaming. The inbound and outbound algorithms
are independent and orthogonal to each other. The outbound traffic for a particular
session can be assigned to a given port while its corresponding inbound traffic
can be assigned to a different port.
Figure 3: Intermediate Driver
Outbound Traffic Flow
The Broadcom Intermediate Driver manages the outbound traffic flow for all
teaming modes. For outbound traffic, every packet is first classified into a
flow, and then distributed to the selected physical adapter for transmission.
The flow classification involves an efficient hash computation over known protocol
fields. The resulting hash value is used to index into an Outbound Flow Hash
Table.The selected Outbound Flow Hash Entry contains the index of the selected
physical adapter responsible for transmitting this flow. The source MAC address
of the packets will then be modified to the MAC address of the selected physical
adapter. The modified packet is then passed to the selected physical adapter
for transmission.
The outbound TCP and UDP packets are classified using Layer 3 and Layer 4 header
information. This scheme improves the load distributions for popular Internet
protocol services using well-known ports such as HTTP and FTP. Therefore, BASP
performs load balancing on a TCP session basis and not on a packet-by-packet
basis.
In the Outbound Flow Hash Entries, statistics counters are also updated after
classification. The load-balancing engine uses these counters to periodically
distribute the flows across teamed ports. The outbound code path has been designed
to achieve best possible concurrency where multiple concurrent accesses to the
Outbound Flow Hash Table are allowed.
For protocols other than TCP/IP, the first physical adapter will always be
selected for outbound packets. The exception is Address Resolution Protocol
(ARP), which is handled differently to achieve inbound load balancing.
The Broadcom Intermediate Driver manages the inbound traffic flow for the SLB
teaming mode. Unlike outbound load balancing, inbound load balancing can only
be applied to IP addresses that are located in the same subnet as the load-balancing
server. Inbound load balancing exploits a unique characteristic of Address Resolution
Protocol (RFC0826), in which each IP host uses its own ARP cache to encapsulate
the IP Datagram into an Ethernet frame. BASP carefully manipulates the ARP response
to direct each IP host to send the inbound IP packet to the desired physical
adapter. Therefore, inbound load balancing is a plan-ahead scheme based on statistical
history of the inbound flows. New connections from a client to the server will
always occur over the primary physical adapter (because the ARP Reply generated
by the operating system protocol stack will always associate the logical IP
address with the MAC address of the primary physical adapter).
Like the outbound case, there is an Inbound Flow Head Hash Table. Each entry
inside this table has a singly linked list and each link (Inbound Flow Entries)
represents an IP host located in the same subnet.
When an inbound IP Datagram arrives, the appropriate Inbound Flow Head Entry
is located by hashing the source IP address of the IP Datagram. Two statistics
counters stored in the selected entry are also updated. These counters are used
in the same fashion as the outbound counters by the load-balancing engine periodically
to reassign the flows to the physical adapter.
On the inbound code path, the Inbound Flow Head Hash Table is also designed
to allow concurrent access. The link lists of Inbound Flow Entries are only
referenced in the event of processing ARP packets and the periodic load balancing.
There is no per packet reference to the Inbound Flow Entries. Even though the
link lists are not bounded; the overhead in processing each non-ARP packet is
always a constant. The processing of ARP packets, both inbound and outbound,
however, depends on the number of links inside the corresponding link list.
On the inbound processing path, filtering is also employed to prevent broadcast
packets from looping back through the system from other physical adapters.
ARP and IP/TCP/UDP flows are load balanced. If the packet is an IP protocol
only, such as ICMP or IGMP, then all data flowing to a particular IP address
will go out through the same physical adapter. If the packet uses TCP or UDP
for the L4 protocol, then the port number is added to the hashing algorithm,
so two separate L4 flows can go out through two separate physical adapters to
the same IP address.
For example, assume the client has an IP address of 10.0.0.1.All IGMP and ICMP
traffic will go out the same physical adapter because only the IP address is
used for the hash. The flow would look something like this:
IGMP ------> PhysAdapter1 ------> 10.0.0.1
ICMP ------> PhysAdapter1 ------> 10.0.0.1
If the server also sends an TCP and UDP flow to the same 10.0.0.1 address,
they can be on the same physical adapter as IGMP and ICMP, or on completely
different physical adapters from ICMP and IGMP. The stream may look like this:
IGMP ------> PhysAdapter1 ------> 10.0.0.1
ICMP ------> PhysAdapter1 ------> 10.0.0.1
TCP------> PhysAdapter1 ------> 10.0.0.1
UDP------> PhysAdatper1 ------> 10.0.0.1
Or the streams may look like this:
IGMP ------> PhysAdapter1 ------> 10.0.0.1
ICMP ------> PhysAdapter1 ------> 10.0.0.1
TCP------> PhysAdapter2 ------> 10.0.0.1
UDP------> PhysAdatper3 ------> 10.0.0.1
The actual assignment between adapters may change over time, but any protocol
that is not TCP/UDP based goes over the same physical adapter because only the
IP address is used in the hash.
Modern network interface cards provide many hardware features that reduce CPU
utilization by offloading certain CPU intensive operations (see Teaming
and Other Advanced Networking Features). In contrast, the BASP intermediate
driver is a purely software function that must examine every packet received
from the protocol stacks and react to its contents before sending it out through
a particular physical interface. Though the BASP driver can process each outgoing
packet in near constant time, some applications that may already be CPU bound
may suffer if operated over a teamed interface.Such an application may be better
suited to take advantage of the failover capabilities of the intermediate driver
rather than the load balancing features, or it may operate more efficiently
over a single physical adapter that provides a particular hardware feature such
as Large Send Offload.
Table 7 provides an example of the performance benefit that teaming offers
by listing the throughput and CPU metrics for an LACP team as a function of
the number of member ports. Chariot Benchmark throughput scales with the number
of ports in the team with a modest increase in CPU utilization. The benchmark
configuration consisted of 16 Windows 2000 clients with a TCP Window Size of
64 KB used to generate traffic. The test server was running Windows Server 2003
with Large Send Offload.
Table 7: LACP Teaming Performance
Mode
Number of Ports
Receive Only
Transmit Only
Bidirectional
CPU Utilization(%)
Throughput (Mbps)
CPU Utilization(%)
Throughput (Mbps)
CPU Utilization(%)
Throughput (Mbps)
No Team
1
22
936
21
949
29
1800
LACP Team
2
34
1419
30
1885
35
2297
3
36
1428
38
2834
37
2375
4
31
1681
43
3770
44
3066
NOTE: This is not a guarantee of performance.
Performance will vary based on number of configuration factors and type
of benchmark. It does indicate that link aggregation does provide a positive
performance improvement as the number of ports in a team is increased.
Large Send Offload enables an almost linear scalability of transmit throughput
as a function of the number of team members as shown in Graph 1.
The Broadcom Smart Load Balancing type of team allows 2 to 8 physical adapters
to operate as a single virtual adapter. The greatest benefit of the SLB type of
team is that it operates on any IEEE compliant switch and requires no special
configuration.
SLB provides for switch-independent, bidirectional, fault-tolerant teaming
and load balancing. Switch independence implies that there is no specific support
for this function required in the switch, allowing SLB to be compatible with
all switches. Under SLB, all adapters in the team have separate MAC addresses.
The load-balancing algorithm operates on Layer 3 addresses of the source and
destination nodes, which enables SLB to load balance both incoming and outgoing
traffic.
The BASP intermediate driver continually monitors the physical ports in a
team for link loss. In the event of link loss on any port, traffic is automatically
diverted to other ports in the team. The SLB teaming mode supports switch fault
tolerance by allowing teaming across different switches- provided the switches
are on the same physical network or broadcast domain.
Network Communications
The following are the key attributes of SLB:
Failover mechanism – Link loss detection.
Load Balancing Algorithm – Inbound and outbound traffic are balanced
through a Broadcom proprietary mechanism based on L4 flows.
Outbound Load Balancing using MAC Address - No.
Outbound Load Balancing using IP Address - Yes.
Multi-vendor Teaming – Supported (must include at least 1 Broadcom
Ethernet controller as a team member).
Applications
The SLB algorithm is most appropriate in home and small business environments
where cost is a concern or with commodity switching equipment. SLB teaming works
with unmanaged Layer 2 switches and is a cost-effective way of getting redundancy
and link aggregation at the server. Smart Load Balancing also supports teaming
physical adapters with differing link capabilities. In addition, SLB is recommended
when switch fault tolerance with teaming is required.
Configuration Recommendations
SLB supports connecting the teamed ports to hubs and switches if they are on the
same broadcast domain. It does not support connecting to a router or layer 3 switches
because the ports must be on the same subnet.
Switch-Dependent
Generic Static Trunking
This mode supports a variety of environments where the adapter link partners
are statically configured to support a proprietary trunking mechanism. This
mode could be used to support Lucent’s Open Trunk, Cisco’s
Fast EtherChannel (FEC), and Cisco’s Gigabit EtherChannel
(GEC). In the static mode, as in generic link aggregation, the switch administrator
needs to assign the ports to the team, and this assignment cannot be altered
by the BASP, as there is no exchange of the Link Aggregation Control Protocol
(LACP) frame.
With this mode, all adapters in the team are configured to receive packets
for the same MAC address. Trunking operates on Layer 2 addresses and supports
load balancing and failover for both inbound and outbound traffic. The BASP
driver determines the load-balancing scheme for outbound packets, using layer
4 protocols previously discussed, whereas the team link partner determines the
load-balancing scheme for inbound packets.
The attached switch must support the appropriate trunking scheme for this mode
of operation. Both the BASP and the switch continually monitor their ports for
link loss. In the event of link loss on any port, traffic is automatically diverted
to other ports in the team.
Network Communications
The following are the key attributes of Generic Static Trunking:
Failover mechanism – Link loss detection
Load Balancing Algorithm – Outbound traffic is balanced through Broadcom
proprietary mechanism based L4 flows. Inbound traffic is balanced according
to a switch specific mechanism.
Outbound Load Balancing using MAC Address – No
Outbound Load Balancing using IP Address - Yes
Multi-vendor teaming – Supported (Must include at least 1 Broadcom
Ethernet controller as a team member)
Applications
Generic trunking works with switches that support Cisco Fast EtherChannel, Cisco
Gigabit EtherChannel, Extreme Networks Load Sharing and Bay Networks or IEEE 802.3ad
Link Aggregation static mode. Since load balancing is implemented on Layer 2 addresses,
all higher protocols such as IP, IPX, and NetBEUI are supported. Therefore, this
is the recommended teaming mode when the switch supports generic trunking modes
over SLB.
Configuration
Recommendations
Static trunking supports connecting the teamed ports to switches if they are on
the same broadcast domain and support generic trunking. It does not support connecting
to a router or layer 3 switches since the ports must be on the same subnet.
Dynamic Trunking (IEEE 802.3ad Link Aggregation)
This mode supports link aggregation through static and dynamic configuration via
the Link Aggregation Control Protocol (LACP). With this mode, all adapters in
the team are configured to receive packets for the same MAC address. The MAC address
of the first adapter in the team is used and cannot be substituted for a different
MAC address. The BASP driver determines the load-balancing scheme for outbound
packets, using layer 4 protocols previously discussed, whereas the team’s
link partner determines the load-balancing scheme for inbound packets. Because
the load balancing is implemented on Layer 2, all higher protocols such as IP,
IPX, and NetBEUI are supported. The attached switch must support the 802.3ad Link
Aggregation standard for this mode of operation. The switch manages the inbound
traffic to the adapter while the BASP manages the outbound traffic. Both the BASP
and the switch continually monitor their ports for link loss. In the event of
link loss on any port, traffic is automatically diverted to other ports in the
team.
Network
Communications
The following are the key attributes of Dynamic Trunking:
Failover mechanism – Link loss detection
Load Balancing Algorithm – Outbound traffic is balanced through a
Broadcom proprietary mechanism based on L4 flows. Inbound traffic is balanced
according to a switch specific mechanism.
Outbound Load Balancing using MAC Address - No
Outbound Load Balancing using IP Address - Yes
Multi-vendor teaming – Supported (Must include at least 1 Broadcom
Ethernet controller as a team member)
Applications
Dynamic trunking works with switches that support IEEE 802.3ad Link Aggregation
dynamic mode using LACP. Inbound load balancing is switch dependent. In general,
the switch traffic is load balanced based on L2 addresses. In this case, all network
protocols such as IP, IPX, and NetBEUI are load balanced. Therefore, this is the
recommended teaming mode when the switch supports LACP, except when switch fault
tolerance is required. SLB is the only teaming mode that supports switch fault
tolerance.
Configuration Recommendations
Dynamic trunking supports connecting the teamed ports to switches as long as they
are on the same broadcast domain and supports IEEE 802.3ad LACP trunking. It does
not support connecting to a router or layer 3 switches since the ports must be
on the same subnet.
Driver Support by Operating System
As previously noted, the BASP is supported in the Windows 2000 Server, Windows
Server 2003, Netware, and Linux operating system environments. In a Netware
environment, NESL support is required because BASP relies on the adapter drivers
to generate NESL events during link changes and other failure events. For Linux
environments, Broadcom’s Network Interface Card Extension (NICE) support
is required. NICE is an extension provided by Broadcom to standard Linux drivers,
and supports monitoring of Address Resolution Protocol (ARP) requests, link
detection, and VLANs.
The following table summarizes the various teaming mode features for each operating
system.
Table 9 summarizes the various link speeds supported
by each teaming mode. Mixed speed refers to the capability of teaming adapters
that are running at different link speeds.
Before creating a team, adding or removing team members, or changing advanced
settings of a team member, make sure each team member has been configured similarly.
Settings to check include VLANs and QoS Packet Tagging, Jumbo Frames, and the
various offloads. Table 10 lists advanced adapter properties and teaming.
A team does not necessarily inherit adapter properties; rather various properties
depend on the specific capability. For instance, teams will not support LSO
even if the underlying adapter can support LSO. Another example would be flow control,
which is a physical adapter property and has nothing to do with BASP, and will be
enabled on a particular adapter if the miniport driver for that adapter has flow control
enabled.
Checksum offload is a property of the Broadcom network adapters that allows
the TCP/IP/UDP checksums for send and receive traffic to be calculated by the
adapter hardware rather than by the host CPU. In high-traffic situations, this
can allow a system to handle more connections more efficiently than if the host
CPU were forced to calculate the checksums. This property is inherently a hardware
property and would not benefit from a software-only implementation. An adapter
that supports Checksum Offload advertises this capability to the operating system
so that the checksum does not need to be calculated in the protocol stack; because
the intermediate driver is located directly between the protocol layer and the
miniport driver, the protocol layer is not able to offload any checksums; therefore,
that hardware capability is unused on a physical adapter that is part of a team.
The IEEE 802.1p standard includes a 3-bit field (supporting a maximum of 8 priority
levels), which allows for traffic prioritization.The BASP intermediate driver
does not support IEEE 802.1p QoS tagging.
Large Send Offload is a feature provided by Broadcom network adapters that
prevents an upper level protocol such as TCP from breaking a large data packet
into a series of smaller packets with headers appended to them. The protocol
stack need only generate a single header for a data packet as large as 64 KB,
and the adapter hardware breaks the data buffer into appropriately-sized Ethernet
frames with the correctly sequenced header (based on the single header originally
provided). Like the Checksum Offload feature listed above, this is a hardware
feature that is not implemented by the intermediate driver. As a result, the
protocol stack will not be able to take advantage of this feature when a physical
adapter is part of a team.
The use of Jumbo Frames was originally proposed by Alteon Networks, Inc. in 1998
and increased the maximum size of an Ethernet frame to a maximum size of 9000
bytes. Though never formally adopted by the IEEE 802.3 Working Group, support
for Jumbo Frames has been implemented in the server Broadcom adapters. The BASP
intermediate driver supports Jumbo Frames, provided that all of the physical adapters
in the team also support Jumbo frames.
In 1998, the IEEE approved the 802.3ac standard, which defines frame format extensions
to support Virtual Bridged Local Area Network tagging on Ethernet networks as
specified in the IEEE 802.1Q specification. The VLAN protocol permits insertion
of a tag into an Ethernet frame to identify the VLAN to which a frame belongs.
If present, the 4-byte VLAN tag is inserted into the Ethernet frame between the
source MAC address and the length/type field. The first 2-bytes of the VLAN tag
consist of the IEEE 802.1Q tag type, whereas the second 2 bytes include a user
priority field and the VLAN identifier (VID). Virtual LANs (VLANs) allow the user
to split the physical LAN into logical subparts. Each defined VLAN behaves as
its own separate network, with its traffic and broadcasts isolated from the others,
thus increasing bandwidth efficiency within each logical group. VLANs also enable
the administrator to enforce appropriate security and quality of service (QoS)
policies. The BASP supports the creation of 64 VLANs per team or adapter. The
operating system and system resources, however, limit the actual number of VLANs.
VLAN support is provided according to IEEE 802.1q and is supported in a teaming
environment as well as on a single adapter. Note that VLANs are supported only
with homogeneous teaming and not in a multivendor teaming environment. The BASP
intermediate driver supports VLAN tagging. One or more VLANs may be bound to a
single instance of the intermediate driver.
Wake on LAN is a feature that allows a system to be awakened from a sleep state
by the arrival of a specific packet over the Ethernet interface. Because a Virtual
Adapter is implemented as a software only device, it lacks the hardware features
to implement Wake on LAN and cannot be enabled to wake the system from a sleeping
state via the Virtual Adapter. The physical adapters, however, support this
property, even when the adapter is part of a team.
The Preboot Execution Environment (PXE) allows a system to boot from an operating
system image over the network. By definition, PXE is invoked before an operating
system is loaded, so there is no opportunity for the BASP intermediate driver
to load and enable a team. As a result, teaming is not supported as a PXE client,
though a physical adapter that participates in a team when the operating system
is loaded may be used as a PXE client. Whereas a teamed adapter cannot be used
as a PXE client, it can be used for a PXE server, which provides operating system
images to PXE clients using a combination of Dynamic Host Control Protocol (DHCP)
and the Trivial File Transfer Protocol (TFTP). Both of these protocols operate
over IP and are supported by all teaming modes.
SLB teaming can be configured across switches. The switches, however, must
be interconnected. Generic Trunking and Link Aggregation do not work across
switches because each of these implementations requires that all physical adapters
in a team share the same Ethernet MAC address. It is important to note that
SLB can only detect the loss of link between the ports in the team and their
immediate link partner. SLB has no way of reacting to other hardware failures
in the switches and cannot detect loss of link on other ports.
The diagrams below describe the operation of an SLB team in a switch fault
tolerant configuration. We show the mapping of the ping request and ping replies
in an SLB team with two active members. All servers (Blue, Gray and Red) have
a continuous ping to each other. Figure
4 is a setup without the interconnect cable in place between
the two switches. Figure
5 has the interconnect cable in place, and Figure 6
is an example of a failover event with the Interconnect cable in place. These
scenarios describe the behavior of teaming across the two switches and the importance
of the interconnect link.
The diagrams show the secondary team member sending the ICMP echo requests
(yellow arrows) while the primary team member receives the respective ICMP echo
replies (blue arrows). This illustrates a key characteristic of the teaming
software. The load balancing algorithms do not synchronize how frames are load
balanced when sent or received. In other words, frames for a given conversation
can go out and be received on different interfaces in the team. This is true
for all types of teaming supported by Broadcom. Therefore, an interconnect link
must be provided between the switches that connect to ports in the same team.
In the configuration without the interconnect, an ICMP Request from Blue to
Gray goes out port 82:83 destined for Gray port 5E:CA, but the Top Switch has
no way to send it there because it cannot go along the 5E:C9 port on Gray. A
similar scenario occurs when Gray attempts to ping Blue. An ICMP Request goes
out on 5E:C9 destined for Blue 82:82, but cannot get there. Top Switch does
not have an entry for 82:82 in its CAM table because there is no interconnect
between the two switches. Pings, however, flow between Red and Blue and between
Red and Gray.
Furthermore, a failover event would cause additional loss of connectivity.
Consider a cable disconnect on the Top Switch port 4. In this case, Gray would
send the ICMP Request to Red 49:C9, but because the Bottom switch has no entry
for 49:C9 in its CAM Table, the frame is flooded to all its ports but cannot
find a way to get to 49:C9.
Figure
4: Teaming
Across Switches Without
an Interswitch Link
The addition of a link between the switches allows traffic from/to Blue and
Gray to reach each other without any problems. Note the additional entries in
the CAM table for both switches. The link interconnect is critical for the proper
operation of the team. As a result, it is highly advisable to have a link aggregation
trunk to interconnect the two switches to ensure high availability for the connection.
Figure 5: Teaming
Across Switches With Interconnect
Figure 6 represents a failover event in which the cable is
unplugged on the Top Switch port 4. This is a successful failover with all stations
pinging each other without loss of connectivity.
In Ethernet networks, only one active path may exist between any two bridges
or switches. Multiple active paths between switches can cause loops in the network.
When loops occur, some switches recognize stations on both sides of the switch.
This situation causes the forwarding algorithm to malfunction allowing duplicate
frames to be forwarded.Spanning tree algorithms provide path redundancy by defining a
tree that spans all of the switches in an extended network and then forces certain
redundant data paths into a standby (blocked) state. At regular intervals, the
switches in the network send and receive spanning tree packets that they use
to identify the path. If one network segment becomes unreachable, or if spanning
tree costs change, the spanning tree algorithm reconfigures the spanning tree
topology and re-establishes the link by activating the standby path. Spanning
tree operation is transparent to end stations, which do not detect whether they
are connected to a single LAN segment or a switched LAN of multiple segments.
Spanning Tree Protocol (STP) is a Layer 2 protocol designed to run on bridges
and switches. The specification for STP is defined in IEEE 802.1d. The main
purpose of STP is to ensure that you do not run into a loop situation when you
have redundant paths in your network. STP detects/disables network loops and
provides backup links between switches or bridges. It allows the device to interact
with other STP compliant devices in your network to ensure that only one path
exists between any two stations on the network.
After a stable network topology has been established, all bridges listen for
hello BPDUs (Bridge Protocol Data Units) transmitted from the root bridge. If
a bridge does not get a hello BPDU after a predefined interval (Max Age), the
bridge assumes that the link to the root bridge is down. This bridge then initiates
negotiations with other bridges to reconfigure the network to re-establish a
valid network topology. The process to create a new topology can take up to
50 seconds. During this time, end-to-end communications are interrupted.
The use of Spanning Tree is not recommended for ports that are connected to
end stations, because by definition, an end station does not create a loop within
an Ethernet segment. Additionally, when a teamed adapter is connected to a port
with Spanning Tree enabled, users may experience unexpected connectivity problems.
For example, consider a teamed adapter that has a lost link on one of its physical
adapters. If the physical adapter were to be reconnected (also known as fallback),
the intermediate driver would detect that the link has been reestablished and
would begin to pass traffic through the port. Traffic would be lost if the port
was temporarily blocked by the Spanning Tree Protocol.
A bridge/switch creates a forwarding table of MAC addresses and port numbers
by learning the source MAC address that received on a particular port. The table
is used to forward frames to a specific port rather than flooding the frame
to all ports. The typical maximum aging time of entries in the table is 5 minutes.
Only when a host has been silent for 5 minutes would its entry be removed from
the table. It is sometimes beneficial to reduce the aging time. One example
is when a forwarding link goes to blocking and a different link goes from blocking
to forwarding. This change could take up to 50 seconds. At the end of the STP
re-calculation a new path would be available for communications between end
stations. However, because the forwarding table would still have entries based
on the old topology, communications may not be reestablished until after 5 minutes
when the affected ports entries are removed from the table. Traffic would then
be flooded to all ports and re-learned. In this case it is beneficial to reduce
the aging time. This is the purpose of a TCN BPDU. The TCN is sent from the
affected bridge/switch to the root bridge/switch. As soon as a bridge/switch
detects a topology change (a link going down or a port going to forwarding)
it sends a TCN to the root bridge via its root port. The root bridge then advertises
a BPDU with a Topology Change to the entire network.This causes every bridge
to reduce the MAC table aging time to 15 seconds for a specified amount of time.
This allows the switch to re-learn the MAC addresses as soon as STP re-converges.
Topology Change Notice BPDUs are sent when a port that was forwarding changes
to blocking or transitions to forwarding. A TCN BPDU does not initiate an STP
recalculation. It only affects the aging time of the forwarding table entries
in the switch.It will not change the topology of the network or create loops.
End nodes such as servers or clients trigger a topology change when they power
off and then power back on.
To reduce the effect of TCNs on the network (for example, increasing flooding
on switch ports), end nodes that are powered on/off often should use the Port
Fast or Edge Port setting on the switch port they are attached to. Port Fast
or Edge Port is a command that is applied to specific ports and has the following
effects:
Ports coming from link down to link up will be put in the forwarding STP
mode instead of going from listening to learning and then to forwarding. STP
is still running on these ports.
The switch does not generate a Topology Change Notice when the port is going
up or down.
SLB teaming can be used with 10/100 hubs, but it is only recommended for troubleshooting
purposes, such as connecting a network analyzer in the event that switch port
mirroring is not an option.
Although the use of hubs in network topologies is functional in some situations,
it is important to consider the throughput ramifications when doing so. Network
hubs have a maximum of 100 Mbps half-duplex link speed, which severely degrades
performance in either a Gigabit or 100 Mbps switched-network configuration.
Hub bandwidth is shared among all connected devices; as a result, when more
devices are connected to the hub, the bandwidth available to any single device
connected to the hub is reduced in direct proportion to the number of devices
connected to the hub.
It is not recommended to connect team members to hubs; only switches should
be used to connect to teamed ports. An SLB team, however, can be connected directly
to a hub for troubleshooting purposes. Other team types can result in a loss
of connectivity if specific failures occur and should not be used with hubs.
SLB teams are the only teaming type not dependant on switch configuration.
The server intermediate driver handles the load balancing and fault tolerance
mechanisms with no assistance from the switch. These elements of SLB make it
the only team type that maintains failover and fallback characteristics when
team ports are connected directly to a hub.
SLB teams configured as shown in Figure 7 maintain their fault tolerance
properties. Either server connection could potentially fail, and network functionality
is maintained. Clients could be connected directly to the hub, and fault tolerance
would still be maintained; server performance, however, would be degraded.
FEC/GEC and IEEE 802.3ad teams cannot be connected to any hub configuration. These
team types must be connected to a switch that has also been configured for this
team type.
It is known that the SLB mode of teaming does not work in an NLB unicast
environment. It is not known, however, why the SLB mode should not work in a
NLB multicast environment. The SLB teaming algorithm is mutually exclusive with
the NLB unicast mechanism.
Dell PowerEdge cluster solutions integrate Microsoft Cluster Services (MSCS)
with PowerVault SCSI or Dell/EMC Fibre-Channel based storage, PowerEdge servers,
storage adapters, storage switches and network adapter to provide high-availability
(HA) solutions. HA clustering supports all adapters qualified on a supported
PowerEdge server.
MSCS clusters support up to 2 nodes if you are using Windows 2000 Advanced
Server. If you are using Windows Server 2003, that support extends to 8 nodes.
In each cluster node, it is strongly recommended that customers install at least
2 network adapters (on-board adapters are acceptable). These interfaces serve
2 purposes. One adapter is used exclusively for intra-cluster heartbeat
communications. This is referred to as the private adapter and usually
resides on a separate private subnetwork. The other adapter is used for client
communications and is referred to as the public adapter.
Multiple adapters may be used for each of these purposes: private, intracluster
communications and public, external client communications. All Broadcom teaming
modes are supported with Microsoft Cluster Software for the public adapter only.
Private network adapter teaming is not supported. Microsoft indicates that the
use of teaming on the private interconnect of a server cluster is not supported
because of delays that could possibly occur in the transmission and receipt
of heartbeat packets between the nodes. For best results, when you want redundancy
for the private interconnect, disable teaming and use the available ports to
form a second private interconnect. This achieves the same end result and provides
dual, robust communication paths for the nodes to communicate over.
For teaming in a clustered environment, customers are recommended to use the
same brand of adapters.
Figure
8 shows a 2-node Fibre-Channel cluster with 3 network interfaces
per cluster node: 1 private and 2 public. On each node, the 2 public adapters
are teamed, and the private adapter is not. Teaming is supported across the
same switch or across 2 switches. Figure
9 shows the same 2-node Fibre-Channel cluster in this configuration.
Figure
8:Clustering
With Teaming Across One Switch
NOTE: Microsoft Network Load Balancing is not supported with Microsoft
Cluster Software
Gigabit Ethernet is typically used for the following three purposes in high-performance
computing cluster (HPCC) applications:
Inter-Process Communications (IPC): For applications that don't require
low-latency high-bandwidth interconnects (such as Myrinet, InfiniBand), Gigabit
Ethernet can be used for communication between the compute nodes.
I/O: Ethernet can be used for file sharing and serving the data to the compute
nodes. This can be done simply using an NFS server or using parallel file
systems such as PVFS.
Management & Administration: Ethernet is used for out-of-band (ERA)
and in-band (OMSA) management of the nodes in the cluster. It can also be
used for job scheduling and monitoring.
In our current HPC offerings, only one of the on-board adapters is used. If
Myrinet or IB is present, this adapter serves I/O and administration purposes;
otherwise, it is also responsible for IPC. In case of an adapter failure, the
administrator can use the Felix package to easily configure adapter 2. Adapter
teaming on the host side is neither tested nor supported in HPCC.
PXE is used extensively for the deployment of the cluster (installation and
recovery of compute nodes). Teaming is typically not used on the host
side and it is not a part of our standard offering. Link aggregation is commonly
used between switches, especially for large configurations. Jumbo Frames, although
not a part of our standard offering, may provide performance improvement for
some applications due to reduced CPU overhead.
In our Oracle Solution Stacks, we support adapter teaming in both the private
network (interconnect between RAC nodes) and public network with clients or
the Application layer above the Database layer.
Figure 9: Clustering With Teaming
Across Two Switches
When you perform network backups in a nonteamed environment, overall throughput
on a backup server adapter can be easily impacted due to excessive traffic and
adapter overloading. Depending on the number of backup servers, data streams,
and tape drive speed, backup traffic can easily consume a high percentage of
the network link bandwidth, thus impacting production data and tape backup performance.
Network backups usually consist of a dedicated backup server running with tape
backup software such as NetBackup, Galaxy or Backup Exec. Attached to the backup
server is either a direct SCSI tape backup unit or a tape library connected
through a fiber channel storage area network (SAN). Systems that are backed
up over the network are typically called clients or remote servers and usually
have a tape backup software agent installed. Figure 10
shows a typical 1 Gbps nonteamed network environment with tape backup implementation.
Figure 10: Network Backup
without teaming
Because there are 4 client servers, the backup server can simultaneously stream
4 backup jobs (one per client) to a multidrive autoloader. Because of the single
link between the switch and the backup server, however, a 4-stream backup will
easily saturate the adapter and link. If the adapter on the backup server operates
at 1 Gbps (125 MB/s), and each client is able to stream data at 20 MB/s during
tape backup, then the throughput between the backup server and switch will be
at 80 MB/s (20 MB/s x 4), which is equivalent to 64% of the network bandwidth.
Although this is well within the network bandwidth range, the 64% constitutes
a high percentage, especially if other applications share the same link.
Using the non-teamed topology in Figure 10, 4 separate
tests were run to calculate the remote backup performance. In test 1 One
Stream, the backup server streamed data from a single client (Client-Server
Red). In test 2 Two Streams, the backup server simultaneously streamed
data from two separate clients (Red and Blue). In test 3 Three Streams,
the backup server simultaneously streamed data from three separate clients.
In test 4 Four Streams, the backup server simultaneously streamed data
from 4 separate clients. Performance throughput for each backup data stream
is shown in Graph 2.
Graph 2: Backup
Performance with No Adapter Teaming
NOTE: Performance results will vary depending on tape drive technology
as well as data set compression.
Load Balancing and Failover
The performance results show that as the number of backup streams increases,
the overall throughput increases. However, each data stream may not be able
to maintain the same performance as a single backup stream of 25 MB/s. In other
words, even though a backup server can stream data from a single client at 25
MB/s, it is not expected that four simultaneous running backup jobs will stream
at 100 MB/s (25 MB/s x 4 streams). Although overall throughput increases as
the number of backup streams increases, each backup stream can be impacted by
tape software or network stack limitations.
For a tape backup server to reliably use adapter performance and network bandwidth
when backing up clients, a network infrastructure must implement teaming such
as load balancing and fault tolerance. �Data centers will incorporate redundant
switches, link aggregation, and trunking as part of their fault tolerant solution.
Although teaming device drivers will manipulate the way data flows through teamed
interfaces and failover paths, this is transparent to tape backup applications
and does not interrupt any tape backup process when backing up remote systems
over the network. shows a network topology that demonstrates tape backup in
a Broadcom teamed environment and how smart load balancing can load balance
tape backup data across teamed adapters.
There are four paths that the client-server can use to send data to the backup
server, but only one of these paths will be designated during data transfer.
One possible path that Client-Server Red can use to send data to the backup
server is:
Example Path: Client-Server Red sends data through Adapter
A, Switch 1, Backup Server Adapter A.
The designated path is determined by two factors:
Client-Server ARP cache; which points to the backup server MAC address.
This is determined by the Broadcom intermediate driver inbound load balancing
algorithm.
The physical adapter interface on Client-Server Red will be used to transmit
the data. The Broadcom intermediate driver outbound load balancing algorithm
determines this (see Outbound Traffic Flow
and Inbound Traffic Flow (SLB only).
The teamed interface on the backup server transmits a gratuitous address resolution
protocol (G-ARP) to Client-Server Red, which in turn, causes the client server
ARP cache to get updated with the Backup Server MAC address. The load balancing
mechanism within the teamed interface determines the MAC address embedded in
the G-ARP. The selected MAC address is essentially the destination for data
transfer from the client server.On Client-Server Red, the SLB teaming algorithm
will determine which of the 2 adapter interfaces will be used to transmit data.
In this example, data from Client Server Red is received on the backup server
Adapter A interface. To demonstrate the SLB mechanisms when additional load
is placed on the teamed interface, consider the scenario when the backup server
initiates a second backup operation: one to Client-Server Red, and one to Client-Server
Blue. The route that Client-Server Blue uses to send data to the backup server
is dependant on its ARP cache, which points to the backup server MAC address.
Because Adapter A of the backup server is already under load from its backup
operation with Client-Sever Red, the Backup Server invokes its SLB algorithm
to inform Client-Server Blue (thru an G-ARP) to update its ARP cache
to reflect the backup server Adapter B MAC address. When Client-Server Blue
needs to transmit data, it uses either one of its adapter interfaces, which
is determined by its own SLB algorithm. What is important is that data from
Client-Server Blue is received by the Backup Server Adapter B interface, and
not by its Adapter A interface. This is important because with both backup streams
running simultaneously, the backup server must load balance data streams
from different clients. With both backup streams running, each adapter interface
on the backup server is processing an equal load, thus load-balancing data cross
both adapter interfaces.
The same algorithm applies if a third and fourth backup operation is initiated
from the backup server. The teamed interface on the backup server transmits
a unicast G-ARP to backup clients to inform them to update their ARP cache.
Each client then transmits backup data along a route to the target MAC address
on the backup server.
Based on the network topology diagram in Figure 11,
backup performance was measured on the teamed backup server when performing
one or more backup streams. Graph 3 shows the tape backup
performance that can be expected on the backup server when conducting network
backups.
Graph 3: Backup
Performance.
The backup performance results are nearly the same as the performance measured
in the nonteamed environment. Because the network was not the bottleneck in
the nonteamed case, teaming was not expected to improve performance. In this
case, however, teaming is recommended to improve fault tolerance and availability.
In the example in which a backup server has 1 adapter, all data streams can
only go through that one adapter, and as shown in the charts, performance was
80 MB/s. In the teamed environment, although the same performance of 80 MB/s
was measured, data from the clients was received across both adapter interfaces
on the backup server. With 4 backup streams, the teamed interface equally received
backup streams across both adapters in a load balanced manner. Two
backup streams were received on Adapter A and 2 streams were received on Adapter
B for a performance total of 20 MB/s x 4 backup streams = 80 MB/s.
Fault Tolerance
If a network link fails during tape backup operations, all traffic between
the backup server and client stops and backup jobs fail. If, however, the network
topology was configured for both Broadcom SLB and switch fault tolerance, then
this would allow tape backup operations to continue without interruption during
the link failure. All failover processes within the network are transparent
to tape backup software applications. To understand how backup data streams
are directed during network failover process, consider the topology in Figure 11. Client-Server Red is
transmitting data to the backup server through Path 1, but a link failure occurs
between the backup server and the switch. Because the data can no longer be
sent from Switch #1 to the Adapter A interface on the backup server, the data
is redirected from Switch #1 through Switch #2, to the Adapter B interface on
the backup server. This occurs without the knowledge of the backup application
because all fault tolerant operations are handled by the adapter team interface
and trunk settings on the switches. From the client server perspective, it still
operates as if it is transmitting data through the original path.
Figure 11: Network Backup
With SLB Teaming Across Two Switches
When running a protocol analyzer over a virtual adapter teamed interface the
MAC address shown in the transmitted frames may not be correct. The analyzer
does not show the frames as constructed by BASP and shows the MAC address of
the team and not the MAC address of the interface transmitting the frame. It
is suggested to use the following process to monitor a team:
Mirror all uplink ports from the team at the switch.
If the team spans 2 switches, mirror the interlink trunk as well.
Sample all mirror ports independently.
On the analyzer, use an adapter and driver that does not filter QoS and
VLAN information.
When troubleshooting network connectivity or teaming functionality issues,
ensure that the following information is true for your configuration.
Although Dell supports mixed speed SLB teaming it is recommended that all
adapters in a team be the same speed (either all Gigabit or all Fast Ethernet).
If LiveLink is not enabled, disable Spanning Tree Protocol or enable an
STP mode that bypasses the initial phases (for example, Port Fast, Edge Port)
for the switch ports connected to a team.
All switches that the team is directly connected to must have the same hardware
revision, firmware revision, and software revision to be supported.
To be teamed, adapters should be members of the same VLAN. In the event
that multiple teams are configured, each team should be on a separate network.
Do not enter a multicast or broadcast address in the Locally Administered
Address field.
Do not use the Locally Administered Address on any physical adapter that
is a member of a team.
Verify that power management is disabled on all physical members of any
team.
Remove any static IP address from the individual physical team members
before the team is built.
A team that requires maximum throughput should use LACP or GEC\FEC. In these
cases, the intermediate driver is only responsible for the outbound load balancing
while the switch performs the inbound load balancing.
Aggregated teams (802.3ad \ LACP and GEC\FEC) must be connected to only
a single switch that supports IEEE 802.3a, LACP or GEC/FEC.
It is not recommended to connect any team to a hub, as a hub only support
half duplex. Hubs should be connected to a team for troubleshooting purposes
only.
Verify the base (Miniport) and team (intermediate) drivers are from the
same release package. Dell does not test or support mixing base and teaming
drivers from different CD releases.
Test the connectivity to each physical adapter prior to teaming.
Test the failover and fallback behavior of the team before placing into
a production environment.
When moving from a nonproduction network to a production network, it is
strongly recommended to test again for failover and fallback.
Test the performance behavior of the team before placing into a production
environment.
When moving from a nonproduction network to a production network, it is
strongly recommended to test performance again
.
Troubleshooting Guidelines
Before you call Dell support, make sure you have completed the following steps
for troubleshooting network connectivity problems when the server is using adapter
teaming.
Make sure the Link Light is ON for every adapter and all the cables are
attached.
Check that the matching base and intermediate drivers belong to the same
Dell release and are loaded correctly.
Check for a valid IP Address using MS ipconfig, or Linux ifconfig
or NetWare CONFIG commands.
Check that STP is disabled or Edge Port/Port Fast is enabled on the switch
ports connected to the team.
Check that the adapters and the switch are configured identically for
Link Speed and Duplex.
If possible, break the team and check for connectivity to each adapter
independently to confirm that the problem is directly associated with teaming.
Check that all switch ports connected to the team are on the same VLAN.
Check that the switch ports are configured properly for Generic Trunking
(FEC/GEC)/802.3ad-Draft Static type of teaming and that it matches the adapter
teaming type. If the switch port is configured for an SLB type of team,
make sure the corresponding switch ports are not configured for
Generic Trunking (FEC/GEC/IEEE802.3ad) types of teams.
Frequently Asked Questions
Question: Under what circumstances is traffic not load balanced?
Why is all traffic not load balanced evenly across the team members? Answer:The bulk of traffic does not use IP/TCP/UDP or the bulk
of the clients are in a different network. The receive load balancing is not
a function of traffic load, but a function of the number of clients that are
connected to the server.
Question: What network protocols are load balanced when in
a team? Answer: Broadcom’s teaming software only supports IP/TCP/UDP
traffic. All other traffic is forwarded to the primary adapter.
Question: Which protocols are load balanced with SLB and which
ones are not? Answer: Only IP/TCP/UDP protocols are load balanced in both
directions: send and receive.�� IPX is
load balanced on the transmit traffic only.
Question: Can I team a port running at 100 Mbps with a port
running at 1000 Mbps? Answer: Mixing link speeds within a team is only supported
for Smart Load Balancing™ teams.
Question: Can I team a fiber adapter with a copper Gigabit
Ethernet adapter? Answer: Yes.
Question: What is the difference between adapter load balancing
and Microsoft’s Network Load Balancing (NLB)? Answer: Adapter load balancing is done at a network session
level, whereas NLB is done at the server application level.
Question: Can I connect the teamed adapters to a hub? Answer: Yes. Teamed ports can be connected to a hub for troubleshooting
purposes only. However, this is not recommended for normal operation, because
the performance improvement expected would be degraded due to hub limitations.
Connect the teamed ports to a switch instead.
Question: Can I connect the teamed adapters to ports in a
router? Answer: No. All ports in a team must be on the same network;
in a router, however, each port is a separate network by definition. All teaming
modes require that the link partner be a Layer 2 switch.
Question: Can I use teaming with Microsoft Cluster Services? Answer: Yes. Teaming is supported on the public network only,
but not on the private network used for the heartbeat link.
Question: Can PXE work over a virtual adapter (team)? Answer: A PXE client operates in an environment before the
operating system is loaded; as a result, virtual adapters have not been enabled
yet. If the physical adapter supports PXE, then it can be used as a PXE client,
whether or not it is part of a virtual adapter when the operating system loads.
PXE servers may operate over a virtual adapter.
Question: Can WOL work over a virtual adapter (team)? Answer: Wake-on-LAN functionality is not supported on virtual
adapters; it is only supported on physical adapters.
Question: What is the maximum number of ports that can be
teamed together? Answer: Up to 8 ports can be assigned to a team.
Question: What is the maximum number of teams that can be
configured on the same server? Answer: Up to 4 teams can be configured on the same server.
Question: Why does my team loose connectivity for the first
30 to 50 seconds after the Primary adapter is restored (fallback)? Answer: Because Spanning Tree Protocol is bringing the port
from blocking to forwarding. You must enable Port Fast or Edge Port on the switch
ports connected to the team.
Question: Can I connect a team across multiple switches? Answer: Smart Load Balancing can be used with multiple switches
because each physical adapter in the system uses a unique Ethernet MAC address.
Link Aggregation and Generic Trunking cannot operate across switches because
they require all physical adapters to share the same Ethernet MAC address.
Question: How do I upgrade the intermediate driver (BASP)?
Answer: The intermediate driver cannot be upgraded through
the Local Area Connection Properties. It must be upgraded using the Setup installer.
Current versions of the installer require you to first uninstall the intermediate
driver.
Question: How can I determine the performance statistics on
a virtual adapter (team)? Answer: Currently, there are no statistics at the team level,
but only at the physical adapter level.
Question: Can I configure NLB and teaming concurrently? Answer: Yes, but only when running NLB in a multicast mode
(NLB is not supported with MS Cluster Services).
Question: Should both the backup server and client servers
that are backed up be teamed? Answer: Because the backup server is under the most data load,
it should always be teamed for link aggregation and failover. A fully redundant
network, however, requires that both the switches and the backup clients be
teamed for fault tolerance and link aggregation.
Question: During backup operations, does the adapter teaming
algorithm load balance data at a byte-level or a session-level? Answer:When using adapter teaming, data is only load balanced
at a session level and not a byte level to prevent out-of-order frames. Adapter
teaming load balancing does not work the same way as other storage load balancing
mechanisms such as EMC PowerPath.
Question: Is there any special configuration required in the
tape backup software or hardware to work with adapter teaming? Answer: No special configuration is required in the tape software
to work with teaming. Teaming is transparent to tape backup applications.
Question: How do I know what driver I am currently using?
Answer: In all operating systems, the most accurate method
for checking the driver revision is to physically locate the driver file and
check the properties.
Question: Can the SLB detect a switch failure in a Switch
Fault Tolerance configuration? Answer: No. SLB can only detect the loss of link between the
teamed port and its immediate link partner. SLB cannot detect link failures
on other ports.
Question: Where can I get the latest supported drivers? Answer: Go to Dell support at www.support.dell.com for driver
package updates or support documents.
Question: Why does my team lose connectivity for the first
30 to 50 seconds after the Primary Adapter is restored (fall-back after a failover)?
Answer:During a fall-back event, link is restored causing Spanning
Tree Protocol to configure the port for blocking until it determines that it
can move to the forwarding state. You must enable Port Fast or Edge Port on
the switch ports connected to the team to prevent the loss of communications
caused by STP.
Question: Where do I monitor real time statistics for an
adapter team in a Windows server? Answer: Use the Broadcom Advanced Control Suite 2 (BACS2) to
monitor general, IEEE 802.3 and custom counters.
The known base and intermediate Windows System Event Log status messages for
the Broadcom NetXtreme 57XX Gigabit Ethernet controllers as of December 2004
are listed. As a Broadcom adapter driver loads, Windows places a status code
in the system event viewer. There may be up to two classes of entries for these
event codes depending on whether both drivers are loaded (one set for the base
or miniport driver and one set for the intermediate or teaming driver).
The base driver is identified by the driver name as shown in the following
example:
Example:
Windows 2000: B57W2k.sys
Windows 2003: B57WXP.sys
Table 11 lists the event log messages supported by
the base driver, explains the cause for the message, and provides the recommended
action.
Table 11: Base Driver Event
Log Messages
Message Number
Message
Cause
Corrective Action
1
Failed to allocate memory for the device block. Check system memory resource
usage.
The driver cannot allocate memory from the operating system.
Close running applications to free memory.
2
Failed to allocate map registers.
The driver cannot allocate map registers from the operating system.
Unload other drivers that may allocate map registers.
3
Failed to access configuration information. Reinstall the network driver.
The driver cannot access PCI configuration space registers on the adapter.
For add-in adapters: reseat the adapter in the slot, move the adapter
to another PCI slot, or replace the adapter.
4
The network link is down. Check to make sure the network cable is properly
connected.
The adapter has lost its connection with its link partner.
Check that the network cable is connected, verify that the network cable
is the right type, and verify that the link partner (for example, switch
or hub) is working correctly.
5
The network link is up.
The adapter has established a link.
Informational message only. No action is required.
6
Network controller configured for 10Mb half-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
7
Network controller configured for 10Mb full-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
8
Network controller configured for 100Mb half-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
9
Network controller configured for 100Mb full-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
10
Network controller configured for 1Gb half-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
11
Network controller configured for 1Gb full-duplex link.
The adapter has been manually configured for the selected line speed and
duplex settings.
Informational message only. No action is required.
12
Medium not supported.
The operating system does not support the IEEE 802.3 medium.
Reboot the operating system, run a virus check, run a disk check (chkdsk),
and reinstall the operating system.
13
Unable to register the interrupt service routine.
The device driver cannot install the interrupt handler.
Reboot the operating system; remove other device drivers that may be sharing
the same IRQ.
14
Unable to map IO space.
The device driver cannot allocate memory-mapped I/O to access driver registers.
Remove other adapters from the system, reduce the amount of physical memory
installed, and replace the adapter.
15
Driver initialized successfully.
The driver has successfully loaded.
Informational message only. No action is required.
16
NDIS is resetting the miniport driver.
The NDIS layer has detected a problem sending/receiving packets and is
resetting the driver to resolve the problem.
Run Broadcom Advanced Control Suite 2 diagnostics; check that the network
cable is good.
17
Unknown PHY detected. Using a default PHY initialization routine.
The driver could not read the PHY ID.
Replace the adapter.
18
This driver does not support this device. Upgrade to the latest driver.
The driver does not recognize the installed adapter.
Upgrade to a driver version that supports this adapter.
19
Driver initialization failed.
Unspecified failure during driver initialization.
Reinstall the driver, update to a newer driver, run Broadcom Advanced
Control Suite 2 diagnostics, or replace the adapter.
The intermediate driver is identified by BLFM regardless of the base driver
revision. The driver names are shown below.
Example:
Windows 2000: baspw2k.sys
Windows 2003: baspxp32.sys
Table 12 lists the event log messages supported by the
intermediate driver, explains the cause for the message, and provides the recommended
action.
Table 12: Intermediate Driver
Event Log Messages
System Event Message Number
Message
Cause
Corrective Action
1
Unable to register with NDIS.
The driver cannot register with the NDIS interface.
Unload other NDIS drivers.
2
Unable to instantiate the management interface.
The driver cannot create a device instance.
Reboot the operating system.
3
Unable to create symbolic link for the management interface.
Another driver has created a conflicting device name.
Unload the conflicting device driver that uses the name Blf.
4
Broadcom Advanced Server Program Driver has started.
The driver has started.
Informational message only. No action is required.
5
Broadcom Advanced Server Program Driver has stopped.
The driver has stopped.
Informational message only. No action is required.
6
Could not allocate memory for internal data structures.
The driver cannot allocate memory from the operating system.
Close running applications to free memory.
7
Could not bind to adapter %2.
The driver could not open one of the team physical adapters.
Unload and reload the physical adapter driver, install an updated physical
adapter driver, or replace the physical adapter.
8
Successfully bind to adapter %2.
The driver successfully opened the physical adapter.
Informational message only. No action is required.
9
Network adapter %2 is disconnected.
The physical adapter is not connected to the network (it has not established
link).
Check that the network cable is connected, verify that the network cable
is the right type, and verify that the link partner (switch or hub) is working
correctly.
10
Network adapter %2 is connected.
The physical adapter is connected to the network (it has established link).
Informational message only. No action is required.
11
Broadcom Advanced Program Features Driver is not designed to run on this
version of Operating System.
The driver does not support the operating system on which it is installed.
Consult the driver release notes and install the driver on a supported
operating system or update the driver.
12
Hot-standby adapter %2 is selected as the primary adapter for a team without
a load balancing adapter.
A standby adapter has been activated.
Replace the failed physical adapter.
13
Network adapter %2 does not support Advanced Failover.
The physical adapter does not support the Broadcom NIC Extension (NICE).
Replace the adapter with one that does support NICE.
14
Network adapter %2 is enabled via management interface.
The driver has successfully enabled a physical adapter through the management
interface.
Informational message only. No action is required.
15
Network adapter %2 is disabled via management interface.
The driver has successfully disabled a physical adapter through the management
interface.
Informational message only. No action is required.
16
Network adapter %2 is activated and is participating in network traffic.
A physical adapter has been added to or activated in a team.
Informational message only. No action is required.
17
Network adapter %2 is de-activated and is no longer participating in network
traffic.
The driver does not recognize the installed adapter.
Informational message only. No action is required.