Quality of Service (QoS) is a set of tools and techniques used by network devices to give different treatment to different packets. This is useful as modern networks are converged networks, i.e. all types of traffic share the same network. There are many benefits to this, but bandwidth isn’t one of them. Every device on a network must compete for bandwidth; this isn’t a problem when bandwidth is plentiful, but when the network is at the limits of how much traffic it can transmit all services will see degraded performance. However, not all traffic is equally important. It’s probably not a big deal if websites take another second to load, but it’s a big problem if you can’t hear the other person on a phone call. QoS allows us to prioritize network resources for more important traffic.

QoS was initially created as a way to give network priority to VoIP traffic, as without it the sound quality of IP voice calls was commonly affected by network congestion. Before the proliferation of VoIP telephones used entirely separate networks, thus there was not competition for bandwidth between voice data and other network traffic.

QoS is used to manage the following characteristics of network traffic:

  1. Bandwidth
    • The overall capacity of the link, measured in bits per second (Kbps, Mbps, Gbps, etc.)
    • QoS tools allow you to reserve a certain amount of a link’s bandwidth for specific kinds of traffic
  2. Delay
    • The amount of time traffic to go from source to destination is one-way delay
    • The amount of time traffic to go from source to destination and return is two-way delay
  3. Jitter
    • The variation in one-way delay between packets sent by the same application
    • IP phones have a ‘jitter buffer’ to provide a fixed delay to audio packets; VoIP is particularly strongly impacted by jitter
  4. Loss
    • The % of packets sent that do not reach their destination
    • Can be caused by faulty cables, failing equipment, electromagnetic interference, etc.
    • Also caused when a device’s packet queues get full and the device must discard packets


For VoIP calls, the following standards are generally recommended for acceptable audio quality: One-way delay: 150 ms or less Jitter: 30 ms or less Loss: 1% or less When these standards aren’t met there can be a noticeable reduction in the quality of the phone call.


When a device receives messages faster than it can forward them out of the appropriate interface, they will be stored in a queue. Queues are first-in-first-out (FIFO); messages will be sent in the order they are received. Of course, queues take up space in the device’s memory. If the rate of traffic doesn’t decrease then it will eventually run out of space to add new messages to the queue. When the queue is full new packets will be dropped. (This is called Tail Drop)

Tail drop is harmful because it can lead to TCP global synchronization.


Recall the TCP Sliding Window

  • Hosts using TCP use the ‘sliding window’ to increase/decrease the rate at which they send traffic as needed
  • When a packet is dropped it will be re-transmitted
  • When a drop occurs, the sender reduces the rate it sends traffic
  • It will then gradually increase the rate again

When the queue fills up and tail drop occurs, all TCP hosts that have packets dropped will slow their rates simultaneously. This could be a large number of hosts. Then they will all increase their transmission rates which rapidly leads to more congestion creating a cycle. Congestion storm Tail drop Reduced ‘sliding window’ simultaneous increase in traffic congestion storm etc. This cycle creates waves of network congestion and under-utilization.

One solution to prevent TCP global synchronization is Random Early Detection (RED). In this system, a device will start dropping packets from random TCP flows when the queue reaches some configured threshold. The hosts of those flows will still reduce, then gradually increase the rate they send traffic at, but not all at once, thus avoiding synchronization.

In standard RED all TCP traffic is treated the same. There is an improved version called Weighted Random Early Detection (WRED) which allows you to configure which packets are dropped depending on their class, protocol, etc.


The whole point of QoS is to give preferential treatment to certain kinds of network traffic when a network is congested. In order to do that, you need to be able to identify which types of traffic to give priority to. That’s what classification is for. Through classification network traffic is organized into traffic classes which are given different priorities to network resources.

There are many methods for classifying traffic:


  • An ACL: Traffic permitted by the ACL is given certain treatment, traffic denied is not.
  • NBAR (Network Based Application Recognition): Performs deep packet inspection, using information not just from Layers 2, 3, & 4, but all the way to Layer 7 to identify traffic.
  • Using the Layer 2 & 3 header fields (This is what we’ll focus on, as the above are unlikely to be on the CCNA exam)

There are two fields common in a given frame/packet that can be used for classifying traffic, one in the Layer 2 frame header, and one in the Layer 3 IPv4 header.

  1. Layer 2, Ethernet Header: The PCP (Priority Code Point) of the 802.1Q tag
    • Only when a dot1q tag is present (trunk links, access links with voice VLAN)
  2. Layer 3, IPv4 Header: The DSCP (Differentiated Services Code Point) field of the IP header can also be used to identify high/low priority traffic.


PCP is also known as CoS (Class of Service). It’s use is defined by IEEE 802.1p.

  • The 802.1Q tag’s PCP is 3 bits long, meaning there are 8 possible values.
  • Each of the 8 possible values corresponds to a different traffic class/type:
PCP ValueTraffic TypesDescription
0Best EffortNo guarantee of delivery or that it meets and QoS standard. Normal traffic, not high-priority. This is the default.
2Excellent effort
3Critical applications
5VoiceNote that IP Telephones will mark their call signaling traffic as PCP3, but the actual voice data as PCP5.
6Internetwork control
7Network Control


As mentioned above, you can’t use PCP to classify traffic unless an 802.1Q tag is present in the Ethernet header. Most obviously, this means that trunk links with 802.1Q encoding will be able to use PCP/CoS, but access ports will also be able to if they have a voice VLAN configured. All other connections will not be able to use PCP/CoS.

The IP ToS Byte

The DSCP (Differentiated Services Code Point) is 6 bits long. It is immediately followed by the 2-bit Explicit Congestion Notification (ECN). Together they make the “Type of Service (ToS)” byte.


This byte used to be organized differently, with the first 3 bits indicating IP Precedence (IPP) and the last 5 being reserved for various purposes, but commonly left unused.


The modern DSCP field is defined by RFC 2474 (1998) with further elaboration in subsequent RFCs.


As there are 6 bits in the DSCP (as opposed to 3 in PCP or IPP) there are many more (127, to be exact) possible values, and it can be difficult to remember all of them. Thankfully, there are a handful of more common ones that you should focus your attention on.

Some of the more noteworthy standard markings are:

  • Default Forwarding (DF) - ‘best effort’ traffic
  • Expedited Forwarding (EF) - low loss/latency/jitter traffic (usually voice)
  • Assured Forwarding (AF) - A set of 12 standard values
  • Class Selector (CS) - A set of 8 standard values, provides backward compatibility with IPP
DF (Default Forwarding)
  • DF is sued for best-effort traffic
  • DSCP marking for DF is 0 (0b000000)
EF (Expedited Forwarding)
  • EF is used for traffic that requires low loss/latency/jitter
  • DSCP marking for EF is 46 (0b101110)
AF (assured Forwarding)
  • AF (Assured Forwarding) defines four traffic classes. All packets in a class have the same priority.
  • Within each class, there are three levels of drop precedence.
    • Higher drop precedence means the packet is more likely to be dropped during congestion
  • First 3 bits denote Class (X), next 2 denote Drop Precedence (Y). The last bit is always 0.
    • Often notated as AFXY
    • E.g. 0b101010 = 0b101|01|0 = AF51
    • Really, this is still a regular DSCP value, so you can also calculate that (just convert to decimal):
      • AF51 = 0b101010 = DSCP 42


The Class of AF cannot go above 4; The highest AF value is AF43 (100110, DSCP 38)


A quick way to convert between AF and DSCP is with the following formula: DSCP = 8X + 2Y

PriorityLowest Drop Precedence-Highest Drop Precedence
Highest PriorityAF41
Lowest PriorityAF11
CS (Class Sector)

A group of eight DSCP values defined for backwards compatibility with IPP.

  • The three bits added for DSCP are set to ‘0’; the original IPP bits are used to make 8 values.



RFC 4954

Developed with assistance from Cisco to bring all of these values together and standardize their use. The RFC offers many specific recommendations, but here are a few key ones:

  • Voice traffic: EF
  • Interactive video: AF4x
  • Streaming video: AF3x
  • High priority data: AF2x
  • Best effort: DF

Trust Boundaries

The trust boundary of a network defines where devices trust/don’t trust the QoS markings of received messages. If the markings are trusted, the message will be forwarded without change. If they aren’t, the device will change the markings according to the configured policy.

Engineers should be careful where they place their trust boundaries. For example if the boundary is between a switch and a connected IP Telephone, then the QoS markings on the phone’s frames will be ignored, and it may be forwarded marked as regular-priority traffic. This is not ideal.

However, if the trust boundary is too far away from configured equipment, a tech-savvy user might be able to manually mark their device’s traffic as higher priority than it should have. A bad-actor may do this either to alleviate network congestion for themselves (at the cost of other users) or to intentionally create a lot of high-priority network congestion.

Congestion Management

An essential part of QoS is the use of multiple queues. One form or another of traffic classification is used to match traffic and place it in the appropriate queue. The device is only able to forward one frame out of an interface at a time, so a scheduler is used to decide which queue traffic is forwarded from next. Prioritization (of the queues) allows the scheduler to give certain queues more priority than others.

There are several scheduling methods:

  • Weighted round-robin
    • Round-robin - packets are taken from each queue in order, cyclically
    • Weighted - more data is taken from high priority queues each time the scheduler reaches that queue
    • CBWFQ (Class-Based Weighted Fair Queuing) uses a weighted round-robin scheduler while guaranteeing each queue a certain percentage of the interface’s bandwidth during congestion.
    • Round robin scheduling is not ideal for voice/video traffic — Even if the traffic receives a guaranteed minimum amount of bandwidth, round-robin can add delay and jitter because even high-priority queues have to wait their turn the the scheduler.
  • LLQ (Low Latency Queuing) - one or more queues are designated as strict priority queues
    • If there is traffic in the queue, the scheduler will always take the next packet from that queue until it is empty — effective for reducing delay and jitter of voice/video traffic
    • However, other queues can get crowded out entirely if there is always traffic in the strict priority queue(s)
    • Policing can control the amount of traffic allowed in the strict priority queue so that it can’t consume all the bandwidth.

Shaping & Policing

Traffic shaping and policing are both used to control the rate of traffic.

Shaping buffers traffic in a queue if the traffic rate goes over the configured rate.

Policing drops traffic if the traffic rate goes over the configured rate. (Optionally, policing can re-mark the traffic instead of dropping it.)

  • ’Burst’ traffic over the configured rate is allowed for a short period of time, as some applications will send a large amount of data all at once, instead of a constant stream.

In both, classification can be used to allow for different rates for different kinds of traffic.