UCS Boot Camp – Day 3 (The Real Deal)

Sorry about the fake post last night.  Several classmates were introduced to my blog today and apparently they were so impressed with my posts from the first two days that they are now relying on my posts to be their notes.  So I had to pull a practicle joke on them. 😉 Today was also big day for Twitter. I helped to hook three of my classmates into the community and discovered that Brad Hedlund covers my territory for Cisco.  This is why I put up a fake post last night, as a joke to them.  Hopefully I didn’t put too many of you off with it.

Welcome to Karan Bhagat, Bobby Mann and Dan Brinkmann.

Today we got to dig into the UCSM interface some more.  One thing we discussed pretty heavily is that the Service Profile assignment is confusing when slot and server are used interchangeably.  Questions like “Do you apply a Service Profile to a server or a slot” came up (answer: slot).  But then, when stepping through the creation process in UCSM, we came across this:

UCS - Day 3 - Service Profile

Figure 1: Associating a Service Profile

Note that your two options are to associate it to a slot or a server.  Confused?  We were too.  The end result: you apply a service profile to a slot, not a server.  If a server is removed from one slot and inserted into another slot, the profile does NOT move with it.  Our advice to Cisco was to clean up the interface by eliminating the use of the word “server” and ensure consistency with the different parts of the interface.

Why eliminate the word “server?”  Well, what is a server?  That can be a very subjective term.  In UCS terms, I view the server as the convergence of the blade hardware, the Service Profile and the Operating System.  Before virtualization and UCS (or HP Virtual Connect) this debate was a moot point because the three parts were inseparable.  Nowadays we may need to reconsider how we use some of these terms.

The process of associating a Service Profile to a blade is a two step process.

  1. Assign Service Profile to a slot.
  2. When blade is inserted, or if already present immediately after step one, it is powered on and booted by a UCS internal PXE server to a PnuOS.  This utility OS provides UCS access to the system for writing BIOS values (UUID, MAC, WWPN, etc.).  The blade is then rebooted to the Service Profile defined boot device.

When disassociating a Service Profile from a slot that has a running blade in it, you are given an option to gracefully shutdown the OS.  This can be useful if you use the “Reassign Service Profile” option to automate the shutdown>disassociate>associate (to different slot)>power on process.

We got our first exposure to the KVM functionality today.  Overall it is similar to the traditional (Java-based) Remote Console on HP’s iLO.  You have the same cursor speed mismatch and a lack of power options and remote media built into the KVM window.  However, you do get a menu option to open the remote media window in the KVM window.  One thing I noticed is that it will allow at least two connections to the same KVM, which is very useful for collaborating.  Overall, the UCS KVM is a bit lacking compared to iLO’s new Integrated Remote Console, but most of the shortcomings could be considered bells and whistles.

Another quirk of the UCS GUI we noticed is that in the Equipment tab of the Navigation pane, the icon of each blade does not change based on the status of the hardware.  A simple change of color to indicate power state would save a ton of clicks.  The number of clicks can quickly become a problem given the huge amount of data provided by UCSM.

One side question that I received an answer to today was that the UCS chassis has no cooling zones like the HP c-Class.  Apparently Cisco felt that no zones is more efficient, but is still studying the merits of both approaches to determine which is truly more efficient.

When creating a Service Profile to enable stateless computing you must specify MAC addresses and WWN in order to achieve the aforementioned separation of hardware, physical identity and OS.  When defining the MAC address, you can choose a pool of MAC addresses or define a specific one.  In either case the default Vendor ID is owned by Cisco so it should only show up within UCS systems.  Of course if you have multiple UCS domains, you will have to manage the uniqueness between the multiple UCS Managers.  When creating a pool of MACs or WWNs, there are no pre-defined ranges like Virtual Connect provides.  However it is very easy to define your own pool.  Simply enter a starting MAC, define the number of address you want in the pool and it creates a pool of sequential MACs.

There are actually two different types of pools that can be created: Server pools, which contain blades, and identity pools, which are actually several different pools for MAC, WWN, UUID, etc.  In a stateless computing paradigm, the Service Profile becomes the glue to hook all these different pools together.  It is possible to attach more Service Profiles to a server pool then there are physical blades in the pool.  You’ll simply end up with Service Profiles that are unassigned.

The server blade pool can be setup manually or in an automated fashion by defining the characteristics (CPU type, # of CPU, amount of RAM, slot #) of the blades that should belong to the pool.  As blades are discovered by UCSM, they are added to all automatic pools for which they meet the criteria.  This means that a blade can be part of multiple pools at once, but it still can only have one Service Profile associated with it at a time.

This Server Farm approach will have a rather dramatic effect on the approach to data center management.  In many organizations the hardware is owned by the business unit whose budget was used to purchase the hardware.  This has led to an “I own that server, you can’t use it for anything else” mentality to datacenter assets.  Virtualization has started to break down this approach due to the shared nature of the resources.  UCS will further erode this philosophy due to the dynamic nature of the use-what-is-need-when-needed approach that a Server Farm enables.  We had a good discussion around this topic and specifically what it will mean to our customers and how they can best adjust to it (hint: VMware is already providing some of the necessary tools).

Over lunch, Chris Haynes (Cisco) stopped in to help us understand how Cisco architected the memory in the B250 to allow up to 384GB.  There was a lot of information to absorb, so I think I may leave this detail to a later post.  Suffice it to say, I was very impressed and now feel that Cisco can make it as a viable server company.

Another question that was clarified today was the effect of UCSM and CMC failovers.  Each UCSM (remember one UCSM per Fabric Interconnect, configured in an Active/Passive cluster) has a SAM controller.  Each SAM controller has a unique ID and communicates with it’s peer to determine who is the Active node.  The UCSM Active/Passive roles are completely independent of the Active/Passive roles of the CMCs on the IOM.  In other words, you can have the Active UCSM on Fabric Interconnect A, but all the Active CMCs on the IOMs in Fabric B.  The cross channel communication between the Fabric Interconnects allows the communication to flow properly.  I’m not sure what would happen should this cross communication channel between Fabric Interconnects fail, but it’s worth investigating.

The UCSM is an OEM version of BMC BladeLogic, but this version of the BMC product is only available within UCS.  James Hollinger, a BladeLogic for UCS specialist at BMC, showed us some of the capabilities of a full implementation
of BladeLogic:

  • Manager of mangers functionality across multiple UCS domains
  • Create profile templates within BladeLogic in order to create Service Profiles across multiple UCS domains
  • Very granular Role Based Security (i.e. user has permissions to stop one service, but no other permissions to the server itself)
  • Additional workflow, including OS and application tasks

We finished the day performing the stateless computing lab where we created Service Profiles with defined WWNs for connectivity to boot volumes on the SAN.  The lab didn’t initially go smoothly since our boot from SAN Operating Systems consistently crashed during boot up for all lab teams.  After pulling in additional Cisco resources, it was discovered that you cannot migrate a Service Profile that boots from SAN and was built on a Menlo card to a system with a Palo card (the same applies in reverse).  This is due to incompatible drivers between the two devices, so the OS starts to load, but cannot access it’s own boot disk since it doesn’t have the proper drivers.

After class, I made an excursion up to VMware’s headquarters where I received a quick tour of the campus by John Troyer and had a nice dinner with John and Mike Coleman.  Thanks guys for a very enjoyable evening.