EVPN-VXLAN Explainer 4 - Route Type Three and Auto-Discovery

EVPN-VXLAN Explainer 4 - Route Type Three and Auto-Discovery

In this post we will have a look at another EVPN Route Type, that being RT-3, which goes by the rather opaque name of 'Inclusive Multicast Ethernet Tag'; and look at how it is used to ensure EVPN peers flood traffic to those neighbours that need it.
Firstly, we'll look at EVPN packet forwarding to provide some context around why this Route Type is important, then we will dive into its details, with all the usual show commands, packet captures and plethora of RFC name drops.

EVPN Packet Forwarding

Let's start off by running through EVPN Packet forwarding, for that we need an example network.

Example Network

  • For this post, we'll use a slightly larger network, this time with three switches.
  • As shown in Figure 1 below, this consists of three Aruba 6300s, all configured for OSPF and EVPN.
  • To emulate customer workloads, I have a physical linux server attached to each 6300. I have configured the interconnecting port as a trunk to generate traffic in different VLANs.
  • Each node is configured with customer-faced VLAN 10, which is bound to VNI 1010.
  • However, only two of the three peers, 6300-1 and 6300-2, are configured with a second VLAN and VNI, 20 and 1020 respectively.

Figure 1: Three switch network

Packet Forwarding in VNI 1010

Let's focus on the EVPN L2VNI associated with VLAN 10. This is configured on each one of the VTEPs, thus all three 6300s are:

  • Locally learning on their VLAN 10 configured ports
  • Sharing these MACs via RT-2s
  • Performing remote learning based upon the UPDATEs received from their two peers.
    (Note: See my previous post if this doesn't make sense to you.)

If all three local clients have transmitted traffic and the VTEPs have sent their UPDATEs, 6300-1's BGP EVPN table looks like Figure 2 below.

Figure 2: 6300-1's EVPN table

Here's 6300-1's BGP EVPN table, as expected we see three MACs, one local and two remote.

Also, here's the 6300-1 MAC-address table.

Known unicast in VNI 1010

With the network in this state, if client 172.16.10.1 sends a packet to 172.16.10.2:

  • 6300-1 receives the data, destined for 48:0f:cf:b9:59:a6, tagged with VLAN 10 on port 1/1/10, this is the trunk port directly connected to 172.16.10.1.
  • 6300-1 has already learnt the 48:0f:cf:b9:59:a6 MAC address via an RT-2 update from neighbouring peer 192.168.0.2 in VNI 1010.
  • 6300-1 encapsulates the data for VXLAN, with an outer IPv4 destination address of 192.168.0.2.

BUM traffic in VNI1010

Now, what if 172.16.10.1 sends a packet to a MAC address that 6300-1 does not know, this being an unknown unicast. Or broadcasts an ARP request? Or multicast, to complete the fabulously named BUM triumvirate?

Without an entry for the the BUM traffic, 6300-1 uses ingress replication to encapsulate the packet in VXLAN and send a copy to each one of its peers that is configured with VNI 1010, 6300-2 and 6300-3.

I think this is a point worth repeating, 6300-1 does not just replicate the traffic to all of its EVPN peers, it only sends a copy of the encapsulated BUM traffic to those peers that particate in VNI 1010.

But how does 6300-1 know that both 6300-2 and 6300-3 are configured with VNI 1010?

With static VXLAN, if you recall, we statically configure the VTEP peers under the VNIs:

interface vxlan 1
    source ip 192.168.0.1
    no shutdown
    vni 1020
        vlan 20
        vtep-peer 192.168.0.2
        vtep-peer 192.168.0.3

With EVPN there is no such configuration required, the EVPN peers indicate their VNIs using another Route Type, that being Route Type 3, which we will now look at in detail.

Auto-discovery with Route Type 3

Route Type 3, or RT-3, has quite a forminable name, that being 'Inclusive Multicast Ethernet Tag Route', which I must admit did completely throw me off what this route type is actually doing when I first looked at EVPN.

EVPN peers use RT-3 to advertise not routes but the VNIs that they are interested in, in other words, the VNIs that they have configured locally.

Rather than having to statically configure each one of the peers that are in the VNI, EVPN speakers use RT-3 to say "hey, I'm configure with VNI 1010, if you are going to flood out traffic for that VNI, count me in!".

Thus, via EVPN, our devices can auto-discover the configured VNIs of remote peers.

Moreover, RT-3s are even more fundamental to the working of EVPN than RT-2s,; even if a device hasn't learnt any local MAC addresses, if it is configured with VNIs, it will advertise this RT-3s to its EVPN peers.

So why haven't we seen any RT-3's in the captures I've been sharing thus far? I've actually been tweaking my show commands to only show RT-2, to make the captures that much cleaner.

In fact, here's 6300-1's BGP EVPN table again, this time without filtering out the RT-3s:

Here's a closer look at a single RT-3. This is the one learnt from 192.168.0.2:

Finally, here's a wireshark packet capture of an RT-3:

RT-3 Packet Capture Notes

Looking at the packet capture of an RT-3 above, we can see there is a lot of information packed in there:

  • Like the other Route Types, an RT-3 is a type of BGP UPDATE, one of the PATH Attributes is a Multi-Protocol Reachability NLRI with the AFI 25 & SAFI 70 to indicate that this is EVPN.
  • Within the NLRI information, at the bottom, we can see confirmation of the route type name and number, 'Inclusive Multicast Route' & 3.
  • As with RT-2s, the RT-3 carries BGP extended communities, namely the route target and the data plane encapsulation type, VXLAN in this case.
  • In addition, the RT-3 carries another Path Attribute called 'PMSI_TUNNEL_ATTRIBUTE', or, to give it its full name, the Provider Multicast Service Interface Tunnel Attribute. Let's have a closer look at this attribute because it contains some crucial data.

The Provider Multicast Service Interface (PMSI) Tunnel Attribute

This attribute was defined in RFC6514 - 'BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs', which is, as the name suggests, all about multicast in L3VPNs.

Essentially the RFC defines various encodings and variables that BGP can use to exchange information about multicast VPNs. There are numerous flavours of multicast and the RFC in question defines the PMSI tunnel attribute as a way of PEs signalling what type of tunnel they are using, whether it be RSVP-TE P2MP, PIM-SSM, PIM-BIDIR etc, and, depending on what type of tunnel is in use, the RFC defines the parameters for that type that the PE needs to share.

So, what's this got to do with EVPN?

One of the tunnel types is 'Ingresss Replication', the process that EVPN peers use to forward BUM traffic.

RFC 6514 states that Ingress Replication is identified by the number 6 in the list of 'Tunnel Types' and if using this type, the 'Tunnel Identifier' is the unicast IP address of the endpoint for the tunnel.

It should be keep in mind that RFC 6514 was originally concerned with MPLS network, if we look at the original PMSI diagram we see an MPLS label between Tunnel Type and Tunnel ID:

+---------------------------------+
|  Flags (1 octet)                |
+---------------------------------+
|  Tunnel Type (1 octets)         |
+---------------------------------+
|  MPLS Label (3 octets)          |
+---------------------------------+
|  Tunnel Identifier (variable)   |
+---------------------------------+

This field was updated and made applicable to VXLAN encapsulated networks in RFC 8365 Section 5.1.3 by replacing the MPLS label with the VNI:

the MPLS label field in the P-Multicast Service Interface (PMSI) Tunnel attribute of the Inclusive Multicast Ethernet Tag (IMET) route are used to carry the VNI

If we check the packet capture we do indeed see the PMSI Tunnel attribute containing:

  • Tunnel Type: Ingress Replication
  • VNI
  • Tunnel ID

Route Type 3 in action

If we look back at our example network, in Figure 1 above, we see that all three 6300s are configured with VNI 1010, but only 6300-1 & 6300-2 are configured with VNI 1020.

Thus, if BUM traffic hits 6300-1 in VLAN 10, which is bound to VNI 1010, 6300-1 will perform ingress replication and forward the traffic as VXLAN encapsulated packets to both 6300-2 and 6300-3.

However, if 6300-1 recevies BUM traffic on VLAN 20, bound to VNI 1020, it will only forward to 6300-2, not 6300-3; and it is RT-3 updates that are at the heart of this process.

Here's the full BGP EVPN table from 6300-1 for VNI 1010 and 1020, note that the EVPN peers send an RT-3 per VNI.

The auto-discovered VNI information is also reflected in the device EVPN table.

Please note for this next capture, I shut 6300-3's customer-facing interface, so that it had no locally learnt MAC addresses.

This shows that, despite not having an MACs to advertise via RT-2s, an EVPN speaker will still advertise their RT-3s.

Therefore, 192.168.0.3 is listed as a peer VTEP for VNI 1010 despite not advertising any MAC addresses.

Closing words

Thus RT-3s, and the auto-discovery functionality, they bring are an important part of the EVPN architecture. RT-3s allow device configuration to remain focused on locally significant information, rather than having to key in each peer for each VNI.

Personally I found the function of RT-3s quite elusive when first learning EVPN, which I think the naming of this Route Type is partly responsible. I felt that 'Inclusive Multicast Route' does not unambiguously represent the actual process of VTEP's declaring their VNIs to each other. ( My thought process went some like: "Hey, I'm not using multicast here, so what's with all these RT-3 entries?")

Also, I believe the rather opaque meaning of this Route Type is compounded by the RFCs treatment of it, but this is somewhat representative of EVPN's evolution as a whole. The PMSI Tunnel attribute starts life as part of multicast for MPLS VPNs (RFC 6514), is then referenced in another RFC for EVPN (RFC 7432), but still in its MPLS guise, before yet another RFC (RFC 8365) adds in detail about its application in VXLAN networks, with the VNI replacing the MPLS label.

I hope this blog post helps others pick through these details of these Route Types and thus avoid the RFC trawling that I've had to perform to get here.

That's it for this post, thanks for reading.
In the next entry in this series we will look at moving on from L2VNIs to L3VNIs.

🐦 @joeneville_