I was busy with a Exchange 2013 Design that included a DAG. My initial setup I did on my home desktop using VMware Workstation. The setup was as follow :

  • 1x AD Server – Windows 2012R2 Server
  • 2x Desktops – Windows 8.1
  • 1x PKI Server
  • 2x Exchange 2013 Servers

I had to take this and do a design taking into account the client already have a Cluster and this DAG VM’s would be added. Here is some of the Design Considerations I took into account:

  • Virtual Machine Configuration
    • Used VMXNET 3 Adapter
    • Used “Virtual Socket” to allocate vCPU’s and not “Cores per Socket”
    • Used Memory reservations
      • Note the impact to HA when using Reservations…design your HA configuration around/for this
    • Note NUMA configurations. If you have to gave the VM 10 vCPU and you only have 6 pCPU’s per socket(2 sockets =12 pCPU’s) you the VM will not be NUMA optimized.
      • For this I had a look at the CPU Contention on the Cluster and it was low
    • Created multiple vDisk as follow:
      • OS drive (Exchange was also installed here)
      • Database Drive – DB 1 from Server 1
      • Database Drive – DB 2 From Server 2 (basically the remote server DAG Replicated DB would be on this drive for both servers)
  • ESXi Host Configuration
    • This is a tricky one as the “Best way ” to configure the ESXi CPU’s is to disable Hyper-Threading. Now if you have a Cluster with 10 hosts and you will only have 2 VM’s that need Hyper-Treading disabled…makes no sense to disable the whole Cluster. Thus consider the “Resource -> Advance CPU -> HT Sharing -> None” option in the VM configuration. Make sure the number of vCPU’s is not more than the number of pCPU’s on the processor(NUMA comes to play here)
  • Networking
    • I used an additional vLan for the Replication network and had no routing for this vLan
    • One have to evaluate the Replication traffic needed, taking into account the ESXi host network card speeds, number of network cards in the server and load balancing that might be needed on the pNic’s. I wanted to ensure the replication traffic had enough bandwidth without impact to other VM traffic.
      • The best option here is to have a VDS switch and ensure that “Load Balance on pNic” is enabled (in my case the client DID NOT have a VDS…)
      • What I did not want is to have the Replication traffic “flat line” the pNic on the vSwitch. Thus I created a Port Group and enabled Traffic Shaping on the Port Group. Limit the Traffic on this Replication Port Group to 750Mb (The servers had 1Gb Nic’s). Thus there “should” always be bandwidth available on a pNic for other VM traffic. If you have vCOPS in the environment you can always evaluate this setting and adjust as need later.
      • I also had a look at the pNic usage on the servers and they were in the low 50-100Mb usage at the time.
    • Don’t be fooled by the VMXNET 3 in guest indication that the speed if 10Gb…if you have 1Gb nic’s in the server the in guest will still state 10Gb. The two have nothing to do with each other. Thus your speed between ESXi hosts will be at 1Gb and not 10Gb.
  • Storage
    • All the documents that one read states the huge IO improvement in Exchange 2013. But you still need to make sure you will have enough IO. Also in my case I already had a storage unit…so have to make do with that.
    • I placed the Database vDisks on different Raid Groups(not just Lun’s…ensured they also on diff Raid Groups)
    • The Hosts already have multi pathing enabled
  • Cluster Settings
    • DRS
      • Created DRS Rules to “Keep the VM’s Appart” that was part of the DAG group and the Witness Server. Thus there is 3 Servers that is part of the rule.
    • HA
      • Disabled Guested Monitoring for the DAG VM’s
      • I disabled HA for the DAG Servers. I did not want the Server to auto start in case of the a Host Failure.
        • Since we have a DAG the DB would fail over to the other Exchange server.
        • If there is any issues on the “failed” VM when starting up we did not want to to have any impact on the Exchange Servers DB’s.
        • We added the following process:
          • After Host failure ensure that all DB’s are mounted on the remaining DAG member
          • Ensure users hare connected to new DB on reaming DAG Server
          • Make sure backups were successful on remaining Dag Server
          • Power up Failed DAG Server
          • Make sure Replication is working
          • Activate the DB on its original Server
  • DRP and Backup/Restore
    • Day to day backups was already in place
    • Daily Backups was being replicated of site
    • Point here is to ensure that this topic is not left out of the design

I suppose there is many ways to skin a cat. This was the way I did it for this client given the infrastructure that I had.

  • Can this be done differently – yes…as long as you have reasons for your decisions
  • Explain in your design what other options you looked at (like using in guest iSCSI perhaps)
  • State where you got the information from
  • A “Best Practice Guide” is only generic..it give me guidelines for my design. I need to design for the client what is their Best Practice to implement this solution using external resource that would still validate the design but within the clients guidelines/framework/limits that I was given (in my case: existing storage, vSwitch’s..ect)

Here is some of the documents that I used for my design:

Leave a Reply

Post Navigation