Category Archives: Uncategorized

A new journey with Arista

arista

As the title suggested after many years working in large enterprises, services providers and as of recently partners I am headed to a network vendor.  This is a large change for me but a very good one.  I am tremendously excited for the opportunity.

I am joining Arista for multiple reasons.  I was able to bring in Arista at a prior position and watch the entire network change in a drastic fashion.  We were able to automate many tasks that were previous manual.  It seems like every time I turn around Arista has integration into multiple different technologies ie docker,openstack,golang etc.

I was fortune enough to attend tech field day last year where Ken Duda CTO of Arista networks delivered one of the most awesome presentations on the evolution of EOS and code quality.

Everyone I have met within Arista has been amazing.  This was an easy decision given the talent and passion of every individual I have met within the organization. It fits my career goals of moving towards SDN, continuous integration/continuous delivery and all things automation.  I start late August.

Network Continuous integration using Jenkins,Jinja2 and Ansible.

I have been into the devops and agile life lately.  I have read the following books that have been extremely technology changing for myself..

Devops 2.0
Ansible for Devops
Learning Continuous integration with Jenkins

The Devops 2.0 book is fantastic and @vfarcic constantly updates the book so it is worth the price of admission.  I have said this many times before about Devops.  Devops is simply using open source tooling to run through a workflow in a automated fashion.  The definition of Continuous integration is constantly testing your code to integration into production.

We as network engineers commonly use CLI or manual tasks to put into production because we know what are doing will “Just work.”  We have done it a million times and it has never failed.  We do not take into account human error or issues within the process ie firewall rules, wrong ip address etc.

This is the devops way which should be implemented for anything new configuration wise following the devops practive.

devopslife

Script/configuration is implemented to Jenkins.  Jenkins schedules the task.  Ansible iterates upon the script.  Checks and balances are done.  The script/config is then ran on the switch.  This is another use case for docker on a switch as this could all be ran in a test container before trying the code on the actual switch.  Finally notifications of the test build being successful are sent out either through slack or through email.

So in this blog post we are going to do something really simple so any network engineer can follow along and get their feet wet into the continuous integration book.  I also highly recommend the three books in which I have posted above.

Our topology is simple
-2 Oracle Virtual BOX VM’s
1.) Ubuntu 14.04 LTS
2.) Arista vEOS running 4.16
-Jenkins is running on the Ubuntu VM
-Ansible is running on the Ubuntu VM
-Python 2.7 is running on the Ubuntu VM

I will start with a simple line of configuration.  NTP is probably the easiest 1 liner of configuration we as network people normally use.  An NTP server can be added to a switch with a 1 liner

ntp server 10.10.10.10

We are going to make this slightly fancy here with Ansible and Jinja templates as if we ever wanted to change our NTP server we could do it on a large amount of switches in a ansible inventory.  So here is the J2 template it is very simple.  Lets take a look at our Ansible structure first.
tree
group_vars – Will hold all the variables in this case the NTP server config.
veos.yaml – contains the ntp server
inventory – contains all the hosts within the inventory file.
ntpserver.yaml – The ansible playbook
scripts – optional directory this has a python script we will get to that later
templates – directory that holds Jinja2 templates
ntpserver.j2 – holds the configuration “ntp server x.x.x.x”

Lets first check out our ansible-playbook

ntpp
This ansible-playbook simply uses the veos hosts which are located in the inventory file.
The second step is uses the eos_template which is located in the templates/ntpserver.j2 and applies this NTP server to each host like a giant for loop.

veos.yaml
ntpserver
ntpserver.j2
ntpserverconfig.png

This would simply add a NTP server to a arista vEOS device as long as its in the .eapi file and in the inventory file.

So for those who are following so far the next step is running this through a CI integration.  That is where Jenkins comes through.  Jenkins can execute the playbook and run through its checks and balances.  Here this is extremely simple as we are not doing much.  The work flow is as follows once again..
devopslife

Here is the job we will run.
jenkins
Here is a example of the job.
job

The first step is to run the playbook and run it in  “–Check” Mode.  This will run the play book as a dry run and not make any changes.  This is simply for any sort of error corrections or something that would be wrong with trying to connect to something within the host file.  So if either switch did not connect.

The second step is to start the configuration.  This will apply the configuration on the switch as it is in the Jinja2 templates.

The last step is a Python script shown here.
python

Alright so enough typing lets go ahead and run the script and hit the console into JenkinsCI.
jenkinsci
We were able to run through the tests everything is successful. Checking the switch it shoes the new NTP server of 10.10.10.10.

Lets purposely fail this setup and try to make a NTP server of something bogus that the switch would never take.

failure

This time it failed.  The first dry run will work because it can connect.  The second execution will not run as it will simply fail at trying to add commands to the switch.
Typically at this point either a email or a chat notification is sent out to make the rest of the team aware.

This was a good exercise walking through CI of network changes.  This is for sure the way to go for testing and checking for network changes in live production.

 

Tech field day VMware NSX at Interop

run_NSX

        I was fortunate enough to go to Interop in Las Vegas on the behalf of tech field day.  So I cannot thank them enough for the opportunity to go to Interop.  For me this was my first Interop I was able to take part in.  Which was really eye-opening and will for sure add many new fortes in regards to new trends in networking.

        I have been using the NSX product for the better half almost 3 years in a large environment.  Before NSX I also leveraged VMware VCNS the predecessor to NSX.  I have written numerous blog posts and scripts all related to NSX.  Alright enough about me and on to the VMware NSX presentations.

        Bruce Davie CTO of VMware networking and security business unit presented upon where VMware is currently on the state of VMware NSX.  VMware’s three largest and most compelling reasons for customers to view or run NSX in their environments comes down to three large reasons.
the-vision-for-the-future-of-network-virtualization-with-vmware-nsx-9-638
Agility – Having an open and central API that can talk to all components within the NSX stack for example NSX edge routers,distributed firewall etc.

Security – Allowing true micro segmentation of VM’s at the vnic level and the availability to virtualize security components as virtual machines.

Application Continuity – Allowing NSX related virtual machines to live within the private or public cloud.

Where NSX is today customer wise.
Screenshot from 2016-05-18 09:44:25
Where NSX was 8 months ago at VMworld.
Screenshot from 2016-05-18 09:43:50

        The numbers are quite impressive to double your customers and triple the amount of customers going into production with NSX based networks.  Looking at the different verticles it is quite impressive.  The list of customers shows different use cases between Health care providers, financial institutions , large enterprises and retailers.

       Talking about operations and visibility Bruce was quick to point criticism of lack of operations and visibility once a customer moves to an overlay network.  VMware took those criticisms very heavily and made it a priority to invest into visibility.  Bruce gives a demo which was from VMworld last year.  In the demo Bruce presents vrops which is VMware’s go to tool operationally for all of their products.  This particular demo had the network management pack.  The network management pack to monitor and alert on both the physical and virtual networks.  Traceflow  was also mentioned.  Traceflow will interject real data packets into a NSX virtual machine to another NSX virtual machine for troubleshooting issues to see if a particular service is blocked a long the path within the virtual network.

        VMware has integrated many third-party monitoring tools within their portfolio.  The two which Bruce gave mention Gigamon which allows for a virtual tap directly into the hypervisor or Gigamon can simply strip VXLAN headers to view raw data packets on the physical network and arkin which is a super slick UI that will give performance statistics on both the physical and virtual networks as well as vsphere information.

        NSX everywhere was quite possibly the most intriguing presentation of the day.  Bruce talked about the future of NSX taking what security policies are within the private cloud and move the same security policies to any hypervisor or baremetal machine given in AWS,Azure etc.  We also touched on the point of running NSX on VDI and airwatch VMware’s mobile VDI product.

        We talked about the possibilities of future hypervisors and platforms which NSX will integrate into. There are plans to integrate NSX in the future to Hyper-V.  As of today the NSX transformers is supported on bare metal linux, KVM and vsphere 6.x. NSX-V will continue to operate on vsphere 6.x.

Moving docker containers manually and automated.

This is a blog post that will cover part of my container obsession just the basics.  Ill go over how to manually create a docker container.  Moving a container anywhere.  Publishing the container to docker hub then finally I wrote a quick python script that will take and automate the creation of as many containers as a user wants.

This is all within my home test lab which was using the following
-Ubuntu 14.04 LTS
-Docker 1.11.0
-Python 2.7 (yah yah I know)
-Assuming the docker client and daemon are on the same host.

First things first in typical Debian we will want to go ahead and install docker.  The funny thing about Ubuntu is that there was a process called docker previously. So in the ubuntu world we want to run the following commands

#sudo apt-get update
#sudo apt-get install docker.io
Screenshot from 2016-05-16 13:02:53

With the 14.04 repository 1.6 is the current docker version which will most likely work for this exercise.  I upgraded to 1.11 due to the ip mac/vlan project.  So we will upgrade to the latest and greatest straight from docker source.

#sudo wget -qO- https://get.docker.com/ | sh
This command should grab the latest version off of the docker repo
Screenshot from 2016-05-16 13:05:56

Alright.  So we are about to deploy our very first container within ubuntu 14.04.  The process is really easy.  I love what docker has done with containers to package everything up so it is so simple that it can be ran from a simple one liner.  So lets get into it.

#sudo docker run -dit ubuntu:14.04.1 /bin/bash
Docker run is command to start a container the argument -dit is disconnect and interactive ubuntu 14.04.1 is the container image and /bin/bash is the run command.
Screenshot from 2016-05-16 13:09:01
It is important to note that this ubuntu image is directly from the docker hub.  Within Dockerhub anyone can pull any public image down from anywhere running docker.  So After docker realizes that ubuntu image is not local it will pull the image down.  Now there are 4 pulls.  This is what makes docker really significant.  There are 4 levels of this image.  What is even more awesome is that if I tied something to the ubuntu image I would only have to ever update the level I tied to it.  So I would never have to keep downloading over and over again the ubuntu image.

So the container is running within this system.
#sudo docker ps -a
Screenshot from 2016-05-16 13:14:28
We can see this containers unique ID which I will get into in a bit.  The image it uses.  The starting bash how long it has ran for.  Ports are something I am not going to get into but basically we can expose network tcp/udp ports to each one of these containers.

We can easily kill the container with the following commands
#sudo docker stop CONTAINERID
#sudo docker rm CONTAINERID
The container needs to be stopped and deleted.  Just as a FYI the container can be brought back up at any point.  The files live within /var/lib/docker/aufs/diff using their container ID.  So they will need to be deleted as well.

So lets go ahead and create out very own docker container unique to lets say a project and try to move it from this VM I am using and off to another docker host.

First thing is create a file in whichever directory called Dockerfile it has to be called exactly that.
#sudo vim Dockerfile
Screenshot from 2016-05-16 16:20:01
This file is where docker will build the image from.  So for example the first line is a simple comment.  The from Ubuntu is saying pull down the ubuntu image.  I am the maintainer.  The next two commands tell the container to first update your respo then go ahead and download iperf. The last is a simple write the output to testfile so later we know it all worked.

We are ready to create our docker image.
#docker build -t firstimage:0.1 Dockerfile
#docker run -it firstimage:0.1 /bin/bash
Screenshot from 2016-05-16 16:27:45
We can see that this has iperf installed from the apt-get we did told it to in the Dockerfile.

Now the first option here we can do to get this container off of this host is to manually move it.  Its possible once the docker container is up and running to save it and move it to any docker platform and run it there!

Saving the docker image
#docker save -o firstimage.tar firstimage:0.1

The image is now saved we need to SCP it to another host
#scp firstimage user@machine:/pathtoscpfile
Screenshot from 2016-05-16 16:45:34
The part that always gets me is that this file is only 156MB!

Next lets go ahead and jump on the machine we moved the file to and run the file

#docker load -i  /pathtofolder
#docker run -it

Screenshot from 2016-05-16 16:45:34
So in that tutorial we have successfully moved a container the manual way.

Next we will walk through how to push this to docker hub.  The first thing to do is sign up for docker hub.

So here is my docker hub repository
Screenshot from 2016-05-16 16:53:08
The repository we are going to move firstimage to is going to be called burnyd/iperf-test

We first want to create a docker tag we need to find the unique docker ID
#docker images | grep firstimage
Screenshot from 2016-05-16 17:08:54
We have found the unique image file.  Now we need to create the tag
#docker tag 4ea31abc539d burnyd/iperf-test:1.0
If we check the images again we can see there is a new image created.

So lets go ahead and push this to dockerhub.
#docker push burny/iperf-test:1.0
Screenshot from 2016-05-16 17:11:01
Now if we check our docker hub we should see this image.
Screenshot from 2016-05-16 17:12:19
It is that simple.  I should be able to pull this container and run it on any docker running host lets try a new host.

#docker pull burny/iperf-test:1.0
Screenshot from 2016-05-16 17:15:45.png
So now we should be able to simply run the image.
#docker run -it burnyd/iperf-test:1.0 /bin/bash
Screenshot from 2016-05-16 17:16:49
So there we have it.  We can grab this image from anywhere in the world!

The last section of this blog post is a automated script I created to run docker images. Schedulers are really the way to go like Kubernetes and Swarm.  I simply created a python script that will prompt the user and ask them how many containers and what commands they would like to run I posted it on Github here.

On this test we will create 20 containers.  Let them run then delete them and tear it all down.  We could easily create 20 iperf client streams to a server if we wanted to for example.

Screenshot from 2016-05-16 17:35:33

Okay now lets go ahead and tear it all down!
Screenshot from 2016-05-16 17:36:20
Here is the code for anyone interested without going to git hub.

Screenshot from 2016-05-16 17:37:30

 

Arista ZTP basics

ZTP within Arista switches makes deploying infrastructure really easy.  ZTP takes away a lot of human errors and allows for zero touch provisioning of switches.  Zero touch provisioning switches really falls in line with a lot of the SDN/Automation craze.  We have all been there when it comes to installing a switch and doing the simple CRTL+C and CTRL+V into note pad.  It never really works out to well.  From personal experience at my last position we were able to end the configuration of switches to facilities to plug them in after a simple script was ran per environment.

In this blog post we will work with the the following environment….

ztpenvironment

I have 4 VM’s EOS-1 to EOS-4 within VMware.  vEOS inside of VMware is a easy Install.  Once all 4 vEOS VM’s are loaded a Ubuntu 14.04 LTS VM will be needed.  That VM will require a install of ISC DHCP Server  and  Arista ZTP server.  For the Arista ZTP server simply follow all the apt-get packages and it should install rather easily.  So once that is out of the way for vEOS make sure to go ahead and change the default ZTP method to mac address for all the vEOS VM’s.

Now that everything is installed lets take a look at how ZTP officially works.

ztp

The switch will come online.  In this case vEOS1 -> vEOS4.  The switch will then try to send out a DHCP discover on the management interface first then all interfaces.  Once the switch receives an IP from the DHCP server the switch will then receive a file from the DHCP options.  That is step two.  This file is generally called a “bootstrap”  a bootstrap file is simply  a python script which tells the switch where to grab its config from.  After the script has been downloaded and all script configurations have been applied the switch will then reboot.

Okay thats great but how does that work…

Here is a screen shot of my DHCP server on the Ubuntu 14.04 LTS VM

dhcpd

The http://192.168.3.230:8080/bootstrap is really where the magic happens.  So if everything was followed correctly in the previous guide of the arista ZTP server on the Ubuntu 14.04 VM you should have your ZTP server running on port 8080.  So when the VM is brought online it will receive its bootstrap file directly from the ZTP server.   Lets reload one of my switches and watch the ZTP server at the same time.

ztpswitch

From the switches perspective it booted up and had zero configuration on it.  It sends out a DHCP request first and received the IP address 192.168.3.227/24.  The DHCP server through a DHCP option then instructed the switch to get its Bootstrap file from http://192.168.3.230:8080/bootstrap.  It then downloads the boot strap file.  After downloading the bootstrap file it will then execute the file and reboot.

From the Ubuntu 14.04LTS servers perspective / ZTP server.

ztpservers

The First line says that the node 005056a878fa is requesting the bootstrap file.  Now what I will explain next is how would the ztp server know this is a distinct switch ie ToR5 vs ToR1.  When these first boot up they receive either a system mac address or a serial number.  I just used the mac address feature.  So not to go off path here but 005056a878fa is the switches system mac address.  I will explain the next few lines in a little bit.

So far we know some what how this process works but how do we know where to store the configuration etc for each device?  How do we make each switch unique?

In the following file. /etc/ztpserver/ztpserver.conf

serverconfig

The location is where the ztp files are I will explain later but boot strap for example is located within /usr/share/ztpserver/bootstrap.  The same with image files etc.  Under the unique identifier I chose systemmac.  Serial can be used as well and server url will be where you can run ZTP server.

So under /usr/share/ztp/server there are a list of the VM’s configuration for ZTP I have created.  Lets go to the node where I just ZTP’d.

/share/ztpserver/nodes/005056a878fa/ is where the node information is.  So each time a new switch is created all the information should be under its own directory within the nodes folder with its MAC address.  There are 3 files located within this directory.

-Startup-configuration – Holds the configuration for the unique switch

-Definition – This is one of my favorites with ZTP I will get into but you can have a definition say something similar to make sure when the switch boots it is always on a certain version of code.  Or download this batch,python or GO script.

-Pattern – Pattern allows to dynamically build the environment via LLDP neighbors if need be.

So here are some snippets of my startup-config,definitions and pattern.  My pattern file is default.  But these files are all needed by ZTP.  Keep in mind I built the configs before hand they just ZTP.  Later in a few days or weeks I will get a ZTP going where it will build the configurations based off of a simple python script.

Startup-Config.

startup

Definitions

Screenshot from 2016-05-09 17:28:37

Pattern

Screenshot from 2016-05-09 17:28:54

In my definition files the very first action is to install_image.  What this simply does is make sure that is the current image.  The second takes the dnsscript from /usr/share/ztpserver/files/scripts/dnsscripts and sends it to the the switch.  Now what this does is simply add a DNS A record of the switches management address each time is ZTP’s.  Its  a simple bash script I put together here.

bashscript

What the script says is do a NSUPDATE to the DNS server I have here at home.  Add host which is the string that is returned with hostname then the MGTINT is the ip address with a complex grep of the ifconfig ma1 interface which is the management interface.  I need to go ahead and do this same thing for all interfaces with some sort of fancy for loop here in the future.

So once that script is sent to the switch we will use the CLI once the script is booted within the configuration with a event-handler.

eventhandler.png

Lastly, I wanted to created a quick python script that will simply connect to all 3 vEOS switches and write erase / reload each of the switches.

regenpythonscript

regenpy

Right now its a build up tear down for any type of testing I want to do.  In the future.  I for sure want to get a automated script that will build the configuration and the nodes directory automatically for me once its finished.  But for right now I can wr erase and remove all nodes.  I also have more DNS entries to make for all the interfaces.

 

 

 

Building Hypervisor leaf spine overlay network with BGP

This has been long overdue.  In this blog post I will explain why a leaf spine model achieves the best scale model for a overlay network.  I was recently on the #packetpushers podcast in the design and build show for BGP within the data center.  We talked about why BGP is the best that we currently have for building a leaf spine infrastructure.  I am big into VMware’s NSX but this sort of topology would be able to relate to any overlay model using the same principles. This will be a rather large post with the following technologies

1.)Leaf Spine architecture
a.)Spine layer
b.)Leaf layer
c.)Physical links
d.)East West bandwidth


2.)BGP
a.)BGP Peerings
b.)ECMP
c.)AS Numbering
d.)Community Strings
e.)Dynamic Peering
f.)AS-Override


3.)NSX
a.)NSX edge router placement
b.)VTEP Communication

Here is your typical Leaf Spine infrastructure.

LeafSpine

Spine Layer

spines

A common misconception here is that the spine switches have to be linked together.  This is due to the prior ways of thinking with first hop redundancy protocols.Each connection from leaf to spine is a point to point layer 3 routed link.  Spine switches are never connected together.  As their soul purpose is to provide east west connectivity for leaf switches.  So any traffic that egresses a leaf switch should simply pick via some ECMP method of landing on either spine to reach another leaf switch.  Spine switches are very similar to “P” routers in a MPLS design.  Each spine is also within the same BGP AS#.

Leaf switches
leafs

Each Leaf switch has its own purpose in this environment. Starting from left to right.  The transit leafs provide connectivity to anything leaving the environment.  When traffic egresses an environment typically we would send it to either another data center,internets or some sort of vendor connectivity or public/private cloud.

Services leaf in a design is generally where you put your external services.  This can be a mixture of bare metal and virtual devices.  I would suggest putting load balancers,AD/LDAP and any type of IP storage in this environment.  Typically load balancers use source nat to have traffic ingress back through the load balancer after leaving a VM.  In the future I will experiment more with hardware based VTEPs.

Edge/Management rack is for connectivity for our NSX or overlay networks.  This is where our NSX routers peer via BGP with the top of rack switches and provide connectivity for all of our compute subnets.

Compute racks.  Once we have our edge rack connected this is where we put all of our compute racks.  So our clusters where have ESXi hosts running our Web,APP and DB cluster related VM’s.

The physical links within this infrastructure from leaf to spine have to be the same speed.  So if you built your environment for 40GB/s links and 100GB/s came out the week after and is the new hotness you are stuck at 40GB/s.  BGP is a distance vector protocol or what I would like to call a “Glorified next hop collector”  Bandwidth is not taken into consideration.  So a 40GB link is the same as a 100GB link.  Do not worry I will explain why you can scale out more spines and it should not matter!

East-West traffic is the largest driving purpose for a leaf spine infrastructure.  Lets take 2 VM’s across two different leaf switches for example.

physicallinks
I only included one leaf switch on each compute rack for simplicity.  As the drawing shows each VM to reach each other will land on a leaf switch.  That leaf switch has 160GB of bandwidth to reach the other leaf switch.  This seams like overkill at the moment but once you start layering a lot of web,app and db like applications thousands per different rack this makes a lot of sense.  So getting back to our previous demonstration with physical link nodes if we find that we need to add more bandwidth there is nothing stopping anyone from adding another spine and one more 40GB link for an additional bandwidth.  Most implementations I have experienced use the trident T2 which typically uses 48 10GB ports and 4 40GB ports.  So 4 spines is the most I have seen at the moment.

BGP

Why BGP?
BGP historically has been given a bad reputation when it comes to convergence as the timers are slower and it was harder to use than your usual IGP that a network person could turn on within a few lines of CLI.. eww CLI!

OSPF is really not a applicable choice in this design as it is typically really difficult to filter with OSPF.  EIGRP is our of the question to due it being proprietary.

BGP has made vast improvements in the protocol.  It is enterprise ready.  BGP is has quicker timers and we can make it dynamic now.

The peerings in a BGP leaf spine architecture are rather easy.  iBGP between each leaf switches and eBGP between each leaf to spine connection. ECMP is rather vital in this topology as BGP by default DOES NOT leverage multiple links in a ECMP fashion.  So generally it has to be turned on.

Community strings are vital.  In the past network people have used prefix-list,access-list and route-maps to control traffic leaving a routing protocol.  They still have their uses’s today but generally traffic leaving each environment should have a community string that matches its BGP AS.  So for example if compute rack 1 uses 65004 it should use a community string of 65004:100.  What works out really well for advertising subnets in the environment dynamically is leveraging the transit switches to to aggregate all of the community strings into one large community string for outbound advertisements to other data centers so it is dynamic.  Today trying to use prefix-lists to control traffic that potentially touch a large amount of routers is less than ideal.  If filtering is necessary on the edge routers that is about the only place I would apply prefix-lists to filter traffic.

Dynamic BGP

The first I heard about this was 2-3 years ago with MPLS routers.  Cisco has moved this technology in all of their latest releases.  I have also tested this with Arista switches.  The idea is that you have a subnet you use for BGP for virtual routers.  Lets say it is 10.10.27.0/24 and all of your NSX edge routers are located on that subnet within the same BGP AS you can dynamically bring up BGP peers on that network.  I like this as there is no need to add neighbors or make a physical switch change.

dynamicneighbors

So any new virtual routers within the 10.10.27.0/24 network talking to the physical switch in this scenario will automatically peer with the physical switch.  Now here is the tricky part.  If you are using multiple tenants within the same physical infrastructure and they need to talk between each other as-override is needed.  However, everything should just follow the default route that is advertised.  I do this just in case a default route is lost.

NSX

NSX or a hypervisor based overlay is what really scales in this environment.  In our edge rack we place our NSX edge routers that peer with the physical network.  These routers advertise our address space where our VM’s live.  Since NSX 6.1.x days they support ECMP.  Since their latest release now in 6.2.x NSX supports the use of seeing BGP AS paths within a given route which was not there prior.

Edgerouters

The compute racks are what make the east west VM to VM connectivity possible.  Within NSX each hypervisor terminates a tunnel known as VXLAN from one hypervisor to another to overlay layer 2 segments.

layer2extended

I have written about this before. The idea here is that the communication is from VXLAN vmk or VTEP.  The outer part of the VXLAN packet will contain the source destination VTEP/VXLAN vmk interface.  The inner part of the encapsulated packet will contain the 10.0.0.5/24 talking with the 10.0.0.6/24 for example.

The broad idea here is that each network that is related to VTEPs needs advertised into BGP.  So from edge/management to any of the computer cluster will use the VTEP network.  So it needs advertised properly.  All of the data plane traffic should be VXLAN and use those following segments.  The end result should look like the following.

leafspineinternal

VCP-Nv announcement / Lab

One of the lab announcements from Vmworld 2014 is a new VCP track for network Virtualization. There blueprint can be found I also posted a previous post on basic NSX concepts. I have been through quite a few installs of it. However, like anything else I would like to master the technology around it. I work with everything except for the firewall and load balancers today. I have created the following topology.

NSXphysical

Physically I have 2 Dell 1950 servers running ESXi5.5 and 2 whitebox servers I have normally used for all sorts of testing. I really like the nested 2 vcenter setup. So in my setup on my white box servers I am running vcenter,vcac and the NSX manager on 2 hosts. The ESG,LDR,controllers and guests are all running on the 1950s. It makes sense to separate the control plane and data plane.

Logically this is how my lab is setup.

NSXlogical

Looking at the logical setup I have 3 tenants. Tenant A,B and C. Tenant A runs OSPF to a SVI Vlan 100 from the ESG. Tenant B runs BGP where OSPF is redistributed back into BGP and BGP to OSPF. Tenant C does not have a LDR just a ESG since I wanted to experiment with that concept. But, like I said in my last post this is exactly what we do with the network today everything simply exists in the hypervisor.

Since I still receive emails today about people using my prior CCNP notes I would like to kick off a similar blog here where I put together notes or topics related to the VCP-Nv. The problem anymore is really finding the time.

Building Virtual networks with VMwares NSX

I have had the time over the last 3 weeks to start setting up NSX along with some help from VMware. I myself have been looking forward to something similar to this for a long time. The chance to do networking on a broad scale of deployment where I do not have to use physical networking gear. I will look at this from a network engineers perspective and not a system admin / virtual administrator. I will quickly highlight some information on NSX terms that will be used.

-ESG Edge Services gateway. This is the edge of the NSX network that allows NSX to reach out to the physical network ie BGP,ISIS,OSPF or Static
-LDR Logical Distributed Router. This is similar to a DVs. This is a router that spans multiple hosts inter or intra clusters. Which allows for Logical interfaces / Distributed DFGW.
-VXLAN A VXLAN is similar to a VLAN in the Layer 2 world. VXLAN is where most of the magic happens where we can virtualize our networks.
-VTEP VLAN tunnel end point. A VTEP is a IP address that each individual ESXi host receives. They will build tunnels between each ESXi host in order to overlay networks.
-VXLAN Bridge Allows bare metal devices to participate in the same subnet as NSX.
-Transport zone. A transport zone allows a large overlay so that ESG and LDR can talk to each other similar to running a VLAN between multiple routers or switches.
-NSX manager The manager speaks back and forth to Vcenter.
-NSX Controllers There are three NSX controllers that will push routes down to each VTEP telling each VTEP how to get to each server.

Alright I am glad that is over. So I will go over the design I decided to use. Mine is a bit complex I was lucky enough to use Nexus 7700s and Nexus 56128s in a leaf and spine setup.

NSXPhysical

So physically this is how my setup looks. I am using 2 ESG’s for redundancy. Each ESG peers with a respective 7ks. Between edge routers and LDR’s I am running OSPF as a dynamic routing protocol. This is extremely similar to how we are doing networking today there is not much of a change. Except for the way I am doing eBGP between the edge routers to the 7ks. I will explain that in a later blog post but I am using OSPF as a recursive lookup.

This design also pushes Layer 3 out to the edge. Which is great because us network people like layer 3 over layer 4.

Logically this is what my design would look like taking out the underlay out of the equation.

NSXvirtual

Logically everything is the same. The idea here is that we are decoupling the physical network and overlaying it. This makes for a great idea as I can spin up as many edge routers as I want to. The ESG and LDR’s are simply VM’s which reside in a cluster.

So how does everything work within NSX from a data flow perspective?

If the VM’s I have posted within 10.1.65.0/24 want to talk to each other the flow is relatively simple. Each VM will be forwarded up to the LDR. The LDR then checks through the VTEP over to the NSX controller to see which VTEP it would traverse for east west traffic. For traffic that is on a different subnet similar flows will happen. Traffic will hit the LDR and be routed across its respective VXLAN.

Some known gotchas for anyone deploying NSX in the future.
Controller and VTEP has to have connectivity to each other.
Manager has to have connectivity into VCenter and use a SSO account
Never ever try to firewall VTEP traffic it wont work out so well
VTEP tunnels will not work with multipathing. ie if I have two VTEP tunnels per ESXi host I will only use one for forwarding within 6.0.4 release of NSX.

IOS XR RPL examples

Here are a few examples of creating IOS XR RPL’s the idea is still vastly the same as route-maps with the difference of live editing similar to the way a file would be edited in vi for linux. I really like XR its better than any OS cisco has every came out with. I will start off with a example of local-pref and community strings then throw it all together in how it would be setup in XR.

Modifying local preference
IOS

route-map LOCPREF permit 10
set local-preference 200

IOS XR

route-policy LOCPREF
set local-preference 200
end-policy

Adding no export to a community string

IOS

route-map NO_EXPORT
set community no-export
end

IOS XR

route-policy NO-EXPORT
set community (no-export)
end-policy

Some take aways with IOS XR that has to be done is if there is a eBGP peering with a upstream neighbor
anywhere within that RPL has to be the use of the pass functionality. So if I had a peering like so

router bgp 1
address-family ipv4 unicast
neighbor 2.2.2.2
remote-as 2
address-family ipv4 unicast

Without a RPL facing 2.2.2.2 I will receive zero routes from 2.2.2.2 so in most demonstrations or IOS
XR best practices there will be a pass command put into a EBGP I like to push mine like so.

route-policy EBGP_PASS
pass
end-policy

So the config turns into the following.

router bgp 1
address-family ipv4 unicast
neighbor 2.2.2.2
remote-as 2
address-family ipv4 unicast
route-policy EBGP_PASS in
route-policy EBGP_PASS out

So just for some more examples of IOS XR lets say I want to tag 10.1.0.0/16 and 10.2.0.0/16 to
no export community strings but let everything else go untagged community wise. First we create
whats called a prefix-set in XR

prefix-set NO-EXPORT
10.76.0.0/16,
10.77.0.0/16
end-set

This is similar to a prefix set how ever there is one really awesome thing about prefix-sets that
are different from XR you can edit them without potentially breaking anything. Once editted I can
add anything else without having to remove a prefix-list like traditional IOS or add a seq number
somewhere along the path.

prefix-set-before

So lets continue on I want to set NO-EXPORT prefix-set to be tagged by the community string but let
everything else go here is how I would set that up.

prefix-set-after
route-policy NO-EXPORT
if destination in NO-EXPORT then
set community (no-export)
pass
elseif
pass
endif
end-policy

So lets take a look at this policy. If in the routing table it matches prefix-set NO-EXPORT then
set 10.1.0.0/16 and 10.2.0.0/16 to (NO-EXPORT) okay if it did not match that elseif pass and it
ends the policy.

There are some other community strings but you get the jist of it. You can also edit a RPL in the
same manor you can edit a prefix-set.

Quick storage notes

I look over these notes generally when I am zoning out a new server or trying to remember some functionality. I have been heavily involved with making a lot of storage san switching lately not just in the FC world but in the FCoE world its been I havent experienced a melt down just yet.

storage notes

Port types
For end devices
N_port – > End host
NL_port- > end host in a artbitrated Loop

Configured on the switches
F_port – > Switchport that connects to a Node port
FL_port – Fabric Loop port. Where you would plug in the storage.

E_port – > ISL port
TE_Port – > Expsnasion port / Extended ISL passes vsan tags.
TF port needs to be ran to a hypervisor similar to a dot1q to a hypervisor without merging fabric or push STP down to a server in the ethernet world.
Addressing

WWNS – 8 byte Similar to a mac address
FCID – 3 bye similat to a IP address The SAN switch makes it

WWNN – Is a Address asigned to the Node each Server gets one.
WWNP – Physical address of a port Like a MAC each HBA gets one

FCID – This is where you route traffic to.
*Domaine ID
Each Switch gets a Domain ID
*Area ID
Each switch have a area ID
End connection Port ID

Sh flogi database – > Gives you all the fiber logins
MDSA# sh flogi database
——————————————————————————–
INTERFACE VSAN FCID PORT NAME NODE NAME
——————————————————————————–
fc1/1 1 0x33000d 10:00:00:00:c9:84:b1:c7 20:00:00:00:c9:84:b1:c7

Device alias makes things easier as you can create a node name or port name and match it do a device alias when devices are zoned.

Fiber channel logins

Flogi – N_port sends to F_port to register
Plogi – Used to write to the target
PLRI – FCP application sending traffic.
SD – Span port for Fiber channel
NP – Node port virtualization

FCNS – Similar to ARP Resolves WWN to FCID

#Fiber channnel name server.
MDSA# sh fcns database

VSAN 1:
————————————————————————–
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
————————————————————————–
0x33000d N 10:00:00:00:c9:84:b1:c7 (Emulex) scsi-fcp:init