Archive for the ‘Site Recovery Manager’ Category

I’ve been working with VMware, Storage and Data Protection for sometime now and over the years I’ve seen data protection products come and go.

A lot of the products/applications/appliances showed innovative, cutting edge technology but often focused on one particular area or system type which left the product lacking and often missed the mark when it came to providing a complete solution.

What do I mean by “Complete ” ? – For example, alot of products over the last few years completely focused on Virtual Infrastructure which while very important, large customers found this to be just another product to manage on top of existing enterprise backup infrastructure.

This post is not a technical review about Actifio PAS (Protection, Availability and Storage) that will likely come later… At the moment I’ve just come out of a technical deep dive session with the guys who work out of the Australian office and thought id share.

Its one thing to see a presentation, or sit in on a technical deep dive session, but untill you’ve used, touched, prodded something, you really cant be sure. I’m really interested to hear from anyone who’s using it to purely protect Virtual Infrastructure as well as people using it to protect both Physical and Virtual Infrastructure.

Here’s a description from the Website: “Actifio’s Protection and Availability Storage (PAS) platform is the industry’s first solution optimized for managing copies of production data, resulting in the elimination of redundant silos of IT infrastructure and data management applications. By virtualizing the management and retention of data, Actifio transforms the chaos of multiple silos of infrastructure and point tools traditionally deployed for backup, disaster recovery, business continuity, compliance, analytics, and test and development into one, Service Level-driven, virtualized Protection and Availability storage device. Actifio PAS delivers a radically simple, application-centric, policy-driven solution that decouples the management of data from storage, network and server infrastructure, resulting in 10X reduction in costs.”

Whats Next ? – From what I saw yesterday it seems to do it all, Backup, Snapshots, Deduplication, Replication…. now its time to dig deeper and find out if it really is the whole package “complete” solution that I’m hoping it is. I have a follow up call planned with the guys from Actifio next week and I will update this post if anything interesting comes out of it.

One of the things which really got my attention yesterday was the support for VMware Site Recovery Manager 5.0, me likey likey.

Checkout the YouTube clips below.

Just a quick post on a problem I had this week at a customer site running VMware Site Recovery Manager 5 with EMC Celerra and VNX mix.

I actually had everything up and running for a couple of weeks before I logged in again to notice in the “Array Manager” section, both arrays showed status “Error” and when I browsed through to refresh the list of replicated datastores I received the error “SRA command ‘discoverDevices’ didnt return a response”.

I logged a case with EMC and was supplied a new version of the enabler which is not yet available on PowerLink. “EMC_VNX_Replicator_Enabler_for_VNX_SRA_v5.0.11.zip

Once this was installed, I performed a refresh under the devices tab and the errors vanished.

As noted above, everything was working perfectly for a couple of weeks before it broke and it turns out what broke it was the vdm (virtual datamover) replication I had set up post the SRM install. The old 5.0.5 verision of the enabler does not filter vdm replication  sessions and I think its fair to say that it breaks the SRA.

I would recommend anyone running the 5.0.5 enabler or below, request the 5.0.11 version from EMC and upgrade.

I was working on a VMware SRM with EMC CLARIION implementation when I come across another gotcha and thought id post about it.

I had to change the IP Address’s for the CLARIION management ports at the DR site and with doing so I went into SRM and edited the array manager configuration so the DR array information used the new management IP Address’s for SPA and SPB.

After performing a rescan of the storage and getting a hideous error, I went looking in the logs and found the following error.

The actual error “Operation denied by Clariion array – You are not priveliged to perform the request” is rather misleading, it looks like the CLARIION is refusing to authenticate the user ( which has been working for months )

After a bit of poking around on PowerLink I found an article which listed the exact same error and listed this as a known issue when changing the IP’s for the SP management ports.

The Fix

  1. Stop the VMware SRM service
  2. Rename the symapi_db.bin in C:\Program Files\VMware\VMware vCenter Site Recovery Manager\scripts\SAN\MirrorView SRA\
  3. Start the VMware SRM service

After doing this I performed a rescan and everything worked as expected.

Looking at the error above it looks like the CLARIION is not authenticating the user “Admin” but im guessing the symapi_db.bin file still references the old management IP’s which no longer exist, and because of this the SRA returns with an authentication error.

Im working on a nice little project at the moment which involves a couple of CX120’s with vSphere  4.1 and VMware Site Recovery Manager 4.1

After installing Site Recovery Manager,  the CLARIION storage replication adapter and EMC solutions enabler I went on to configure the storage arrays in SRM and received the following error

” Error occurred XML Document Empty”

After a bit of searching in the forums I found someone had the same error which was a result of the person not having installed the solutions enabler. This was enough to point me in the right direction and knowing I definitely had it installed, I figured it must be a version issue…. after uninstalling the x64 bit version and installing the x86 package I was up and running.

Initially I thought this might have been a compatibility issue so I went and checked out the documentation to make sure all my ducks were lined up. If you’re implementing a CLARIION with SRM make sure you check out the latest SRA release notes as it lists the prerequisites for a supported configuration.

You might be thinking that the title of the post is a bit misleading as it’s not really a CLARIION issue as such, but in some ways it is because the CELERRA SRA no longer requires the solutions enabler to be installed.

The image below I extracted from the latest CLARIION SRA release notes at the time of this post.

There has always been a lot of debate in the VMware community about which IP storage protocol performs best and to be honest I’ve never had the time to do any real comparisons on the EMC Celerra, but recently I stumbled across a great post by Jason Boche comparing the performance of NFS and iSCSI storage using the Celerra NS120, you can read this here.

What you’ll find reading the later part of Jason’s post is once a few tweaks were made for NFS the results were actually very similar. So if there is not a clear winner on the day, how do we best decide which is the best storage protocol for your VMware environment ?

The “Bigger Picture”

I can tell you in the past I have always deployed Celerra’s using iSCSI and I like to think this choice was made with the “Bigger Picture” in mind. If we go back in time and look at VMware support matrix’s you’ll notice that a lot of the add ons such as VMware Consolidated Backup, Storage vMotion, Site Recovery Manager all supported iSCSI well before NFS was officially supported.

It was these considerations early on that lead me down the iSCSI path, and then of course later on iSCSI was something I become comfortable with so naturally it become my protocol of choice.

Another consideration with the Celerra was integration with EMC’s Replication Manager which could be used to provide application consistent snapshots of Exchange, SQL, Oracle and VMFS datastores when iSCSI was used.

That was before, how about now ?

So a couple of years down the track and things have changed considerably, VMware Consolidated Backup, Storage vMotion, Site Recovery Manager 4.0 and Replication Manager 5.2.2 now all support VMware NFS datastores.

Ready to change to NFS yet ?

Even with all these changes, I still was not ready to move away from iSCSI to NFS because vSphere 4 brought major improvements to the VMware software iSCSI initiator which allows multiple VMKernel ports to be bound to the iSCSI initiator to give the ESX host multiple paths so the storage array.

Shame on me

So earlier on in the post I talked about the “Bigger Picture“, Is the improvement toVMware’s software iSCSI initiator part of the bigger picture ?

No, this is a small technical nice to have feature, but really I needed to take a step back and think about what NFS means to Celerra and what makes NFS appeal more so than iSCSI. (Keep reading to find out)

Where NFS trumps iSCSI on the Celerra

Replication Manager When creating an iSCSI lun you first create a file system, think of the file system as a container for the iSCSI lun.

Without the need for snapshots, this file system only needs to be fractionally bigger then the iSCSI lun to accommodate for meta data, but as soon as you need to start performing snapshots of iSCSI luns, the requirements for additional file system overhead change completely.

Long story short with a fully provisioned iSCSI lun, the minimum file system space required to perform a snapshot of the lun (and not taking into account changed data) is 2 x the published lun size, there are of course ways to reduce this required overhead and if you want to read more about this you can read one of my older posts about this here.

Replication Manager 5.2.2 as mentioned earlier, now supports snapshots of NFS datastores, the good news here is the Celerra uses a totally different method for the snapshots of NFS file systems (using a dedicated savevol rather than its own file system) and allowing 20% overhead for snapshots is a realistic figure.

Site Recovery Manager VMware Site Recovery Manager using Celerra Replicator also uses the Celerra snapshot functionality to replicate Source iSCSI luns to a remote Celerra. The same overhead requirements as noted in the Replication Manager section are applicable here also.

Site Recovery Manager 4 of course now supports NFS, existing customers feeling the pain from the overhead needed to support iSCSI snapshots can at least now migrate everything to NFS datastores and claim back a tone of that valuable capacity.

EMC Celerra I beleive was the first storage platform to provide automated fail back, this is done using a vCenter plugin, the new version now supports NFS.

Celerra NFS vSphere Plugin EMC recently released a new Celerra plugin which integrates with the Celerra, you really have to check out this YouTube video to see how cool some of the integrated features are when using NFS.

Deduplication EMC Celerra has supported deduplication for some time now, almost every release of DART recently has brought with it considerable improvements. The one that’s caught my eye in the latest release 5.6.48 is now optimized to work with VMDK files in an NFS data store, reducing space consumption by up to 50 percent. The overhead of accessing the compressed VMDK is ess than 10 percent. When using the Celerra NFS vSphere plugin an administrator can select which virtual machines in the data store to compress. If the additional overhead is too much then the administrator can simply uncompress the virtual machine on the fly.

Summary

The point of this post is not to advocate NFS over iSCSI, my intention here is really just to show how important it is to take that step back and look at the overall solution before you rush ahead choose a protocol which may not end up being the best choice for your environment.

As a consultant who implements systems, reviewing the two different protocols was a good reminder to myself not to get too stuck in my ways, things change !

One of the best features of Site Recovery Manager is the ability to perform regular testing of the DR process without any impact on the production systems.

Just recently a mate in the UK called me up with an issue which occurred while he was performing a failover in SRM, I had seen the exact same issue before on a customers site so after a quick change to one of the timeout values, everything was all sorted.

The point I want to make here is that in both cases the Test option in SRM had worked perfectly but when an actual failover was done (using the RUN button) SRM failed the task leaving people scratching their heads.

Some may argue the amount of manual steps and outages required to perform an actual fail over and fail back  (which of course does have an impact on production systems) outweighs the need to perform this kind of testing, some may say its drastic.

Personally I don’t think so, kudos to the customers who actually schedule production outages to truly put the technology to the test.

Given the amount of people talking in the forums about how Site Recovery Manager 1.x did not support automated failback, I was a little surprised to find this functionality also missing from 4.0.

Well the good news is if you have an EMC Celerra, the new failback plug-in now supports SRM 4.0 and allows a user to automatically failback data to the primary site, the plugin completes both storage and VMware failback tasks.

Whats new ?

Well other than supporting vCenter 4.0, the new plugin also supports the failback of NFS datastores (previous version supported iSCSI only)

What does the plug-in do ?

The Plug-in is a supplemental software package for the VMware Site Recovery Manager (SRM). This plug-in allows users to automatically failback virtual machines and their associated datastores to the primary site after implementing and executing disaster recovery through SRM for Celerra storage systems running Celerra Replicator V2 and Celerra SnapSure.

The plug-in does the following:

  • Provides the ability to input login information (hostname/IP, username, and password) for two vCenters and two Celerras.
  • Cross references replication sessions with vCenter datastores and virtual machines (VMs).
  • Provides the ability to select one or more failed over Celerra replication sessions for failback.
  • Supports both iSCSI and NFS datastores.
  • Manipulates the vCenter at the primary site to rescan storage, unregister orphaned VMs, rename data stores, register failed back VMs, reconfigure VMs, customize VMs,remove orphaned .vswp files for VMs, and power on failed back VMs.
  • Manipulates the vCenter at the secondary site to power off the orphaned VM, unregister the VM, and rescan storage.
  • Identifies failed over sessions created by EMC Replication Manager and direct the user as to how these sessions can be failed back.

For anyone interested, here are the release notes and also the plug-in download.