Archive for April, 2009

A Must Read

Posted: April 30, 2009 in Recommended

Tonight I was cleaning up some books I had sitting down in our spare room when I come across a book which Ive been meaning to post about for some time.

I’m the first to admit I don’t read enough books… busy with kids, busy doing geeky things, busy with work… I can always drum up tones of excuses why I don’t read enough but every now and then I manage to work my way through one.

Ill be honest and say that most technical books bore me and I often dont get past the first or second chapter, WHY ? well I think half the time its because the books read like manuals, and you know how much us men love to read manuals .. ” No No ill just set it up and refer to the manual if i cant get it going” ill say to my wife.

Well one technical book I had no problem getting through was Enterprise Systems Backup – A corporate insurance policy by colleague Preston De Guise. Coming from a Systems Administrator/ Backup Administrator background I knew this was going to be something that I would be interested in reading but what I really liked about Prestons book was the unique writing style.. what am I talking about, here’s an extract from the first chapter;

 

 

Hundreds of years ago, primitive villagers would stand at the mouth of a volcano and throw an unfortunate individual into its gaping maw as a sacrifice. In return for this sacrifice, they felt they could be assured of anything from a safe pregnancy for the chief’s wife, a bountiful harvest, a decisive victory in a war against another tribe (who presumably had no volcano to throw anyone into), and protection from bad things.

Too many companies treat a backup system like those villagers did the volcano. They sacrifice tapes to the backup system in the hope that it guarantees protection. However, when treated this way, backups offer about as much protection as the volcano that receives the sacrifice.

Sacrifices to volcanoes were seen as a guarantee of protection. Similarly, backups are often seen as a guarantee of protection, even when they’re not configured or treated properly. In particular, there is a misconception that is something which is called “backup software” is installed, then a backup system has been installed.

This book is all about learning how to build the backup system.

“Enterprise Systems Backup and Recovery: A Corporate Insurance Policy” moves beyond scripts and individual backup/recovery packages to explain not only how things work in a backup environment, but why they must be done. It is an indispensable tool not only for system architects, but also managers responsible for data protection within the environment.

 

 I highly recommend this book for anyone who’s part of an IT organization, It doesn’t matter if your at Help Desk, Team Leader, Systems Administrator level or the CEO of the company, I guarantee you will  learn something from reading this book and in most cases expose you to concepts,best practices  and strategies you had previously not considered

You can find more information about Preston and his book here

Advertisements

Well after a few nights off I’m ready to start talking technical stuff again. Ive recently had the need to start looking into how I’m going to generate some reporting on both ESX hosts and Virtual Machines configured in Virtual Center.

After looking around on the VMware forums I come across a recommendation by Texiwill for a small utility called RVtools by Rob de Veij.

Rob describes RVtools as;

“A small .NET 2.0 application which uses the VI SDK to display information about your virtual machines. Interacting with VirtualCenter 2.x or ESX 3.x RVTools is able to list information about cpu, memory, disks, nics, cd-rom, floppy drives, snapshots, VMware tools, ESX hosts, datastores and health checks. With RVTools you can disconnect the cd-rom or floppy drives from the virtual machines and RVTools is able to list the current version of the VMware Tools installed inside each virtual machine. and update them to the latest version”

Ive installed RVtools and pointed it at our production Virtual Center server for evaluation and I was very impressed. I hope to get some screen shots up in the next couple of days but until then here is a screenshot from Robs site.

rvtools_vtools1

 

If your looking around for something simular in nature, I highly recommend you visit Robs site and download the latest package.

As mentioned in previous post Ive been busy lately and Ive been a bit all over the place. One night I was watching TV and come across this very funny commercial which im pleased to say is made in New Zealand, I’m not a fan of 99% of TV commercials so its good to see quality ones like this that make you laugh out loud. (Well I did).

Ive been plenty busy latley with a number of projects latley so I havnt been looking over the VMware forums latley but I went back over most the posts for the last couple of weeks and found something that really interested me and I wanted to post incase others found this usefull.

Have you noticed during an SRM failover that the prefix Snap_of_ gets added to the origional Lun name ? Does it bother you? Well I think this may vary person to person but when I saw Mike Laverick’s post on this I just had to make the change.

Locate in the C:\Program Files\Site Recovery Manager\Config directory the vmware-dr.xml file. Modify the

<fixRecoveredDatastoreNames>false</fixRecoveredDatastoreNames> to be <fixRecoveredDatastoreNames>true</fixRecoveredDatastoreNames>

If you would like to check out the thread on the VMware forums you can find it here.

Last week NetWorker 7.5.1 was released on PowerLink, after downloading the package I decided to look through the release notes. Now its safe to say anything to do with VMware and NetWorker interests me so when I saw “Performing A single step recovery of the full virtual machine” I went straight to page 126. If you have a PowerLink account you can download the release notes here.

Below a screenshot showing the new functionality.

vcb recover

 

If your already familiar with NetWorker VCB recoveries you’ll know that to recover a Virtual Machine from a VCB backup is a TWO step process 1. Perform a saveset recovery of the FULLVM saveset and then 2. Use VMware convector to import the virtual machine back into Virtual Infrastructure.

Now with 7.5.1 and VMware converter installed on the proxy server this is now a ONE step process. You can see in the right hand panel you now have fields to enter information you would typically enter into VMware Converter.

I think its really good we are starting to see these kinds of improvements in NetWorker, its not an amazing feature but I think its just a taste of whats to come around the corner.

The following considerations apply when performing a single step restore of a full VMware virtual machine:

  • The VMware Consolidated Backup (VCB) proxy system must be running Microsoft Windows 2003 SP1.
  • Restore of the full virtual machine is only supported using save set recovery.
  • The user must have the required VMware privileges to register and to create virtual machines.
  • The VMware converter must be installed on the VCB proxy host machine. If the VMware converter is not installed, the save set of the full virtual machine (FullVM) can be recovered using a traditional NetWorker recovery.
  • The VMware virtual machine will restore to the same VMware ESX server orVMware Virtual Center (VC) taken at the time of backup.
  • Specifying another VMware ESX server or VMware VC server will cause the restore operation to fail.
  • A restore of the VMware virtual machine will fail if the VMware virtual machine already exists in the specified VMware ESX or VMware VC server.
  • The ESX server must be at version 3.x or later.

One of the nice features of the Worpress blog is you can see what searches people are using to find your site which is something I keep an eye on in case there’s any good ideas for future posts.

Now on the other hand you can also leave a comment on any of the posts like Vijaysys did.

Hi

Howis it possible to get the report of list of all savesets along with each client which we configure….

tnx

The NMC (NetWorker Management Console) makes it fairy easy to sort  clients and savesets but there is no easy way to export this out.

This is an example of why im always saying to people “NSRADMIN or MMINFO is your friend”

So here im going to use trusty old nsradmin, im guessing what Vijaysys wants is to get a list of all the clients configured in NetWorker and show the savesets configured against each client instance.

Here im going to use nsradmin from the command line to achieve this.

Step1.First create a txt file on the desktop called input.txt and past the following commands into this file and save.

show name
show save set
show group
print type: nsr client
p

Step2. Open a command prompt and CD to the desktop, then run the following command.

for /F %1 in (input.txt) do nsradmin -i %1 > nsradmin_clients.txt (Windows)

Step3. Open the file nsradmin_client.txt using notepad and you should see something along the same lines as my output shown below.

name: networkerserver;
group: Default;
save set: All;

name: vmmachine2;
group: FullVM;
save set: ALLVMFS;

name: networkerserver;
group: ;
save set: “NMCASA:/gst_on_networkerserver/lgto_gst”;

name: vmmachine1;
group: FullVM;
save set: *FULL*;

If your using Unix/Linux then the following command should do the same job.

for i in $(input.txt); do $(nsradmin -i $i); done

I hope this helps.

Today I was thinking about a problem I had this week with site recovery manager and thought id post something just to keep track of the errors and in case someone had the same problem. Let me paint a picture.

Protected Site

Virtual Center Server with SRM installed

Celerra storage replication adapter

Three  node VMware HA/DRS cluster.

Celerra NS120 presenting 3 x iSCSI luns to production ESX hosts.

Recovery Site

Virtual Center Server with SRM installed

Celerra storage replication adapter

Single ESX host.

Celerra NS20 presenting 3 x read only  iSCSI luns to recovery site ESX host.

The Problem

As noted above I have 3 luns replicating from the NS120 to the NS20, they are all part of the same protection group configured in SRM. The largest Lun contains the virtual machine OS files, the 2nd and 3rd lun are SQL Logs and TempDB for one of the protected VMs.

When I kicked off the SRM test I noticed that only 2 of the 3 luns (logs and tempdb) were being snapped and presented to the ESX host at the recovery site, So of course the Test failed hideously when trying to start the VM’s which raises an interesting question, shouldn’t the “Prepare Stroage” recovery step warn that 1 of the expected Luns configured in the “Array Manager” SRM section failed to present to the ESX host? Rather than it failing with ” Failed to recover datastore:” at the point of trying to start the Virtual Machine.

I went and grabbed the logs from the VC/SRM server and started to look through them to see what I could find.

The three replicated luns are part of shadow group ‘shadow-group-3685’

          primaryUrl = “sanfs://vmfs_uuid:49b58a75-48b0e5f8-e7c8-00151777f2cc/”,

         peerInfo = (dr.san.Lun.PeerInfo) [

            (dr.san.Lun.PeerInfo) {

               dynamicType = <unset>,

               arrayKey = “CK2000822009760000”,

               lunKey = “fs43_T1_LUN1_CKM00085000953_0000_fs41_T2_LUN1_CK200082200976_0000”,

 

         primaryUrl = “sanfs://vmfs_uuid:49b82702-3e24f2c8-6e7f-00151777f2cc/”,

         peerInfo = (dr.san.Lun.PeerInfo) [

            (dr.san.Lun.PeerInfo) {

               dynamicType = <unset>,

               arrayKey = “CK2000822009760000”,

               lunKey = “fs45_T1_LUN2_CKM00085000953_0000_fs42_T2_LUN2_CK200082200976_0000”,

 

         primaryUrl = “sanfs://vmfs_uuid:49b82718-7a028960-9092-00151777f2cc/”,

         peerInfo = (dr.san.Lun.PeerInfo) [

            (dr.san.Lun.PeerInfo) {

               dynamicType = <unset>,

               arrayKey = “CK2000822009760000”,

               lunKey = “fs47_T1_LUN3_CKM00085000953_0000_fs46_T2_LUN3_CK200082200976_0000”,

 

 

SRA creates LUN snapshots.

 

[2009-03-30 12:36:49.031 ‘SecondarySanProvider’ 868 verbose] Creating lun snapshots for group ‘SRM Protected Systems’

 [#1]   <ReplicaLunKeyList>

[#1]     <ReplicaLunKey>fs43_T1_LUN1_CKM00085000953_0000_fs41_T2_LUN1_CK200082200976_0000</ReplicaLunKey>

[#1]     <ReplicaLunKey>fs45_T1_LUN2_CKM00085000953_0000_fs42_T2_LUN2_CK200082200976_0000</ReplicaLunKey>

[#1]     <ReplicaLunKey>fs47_T1_LUN3_CKM00085000953_0000_fs46_T2_LUN3_CK200082200976_0000</ReplicaLunKey>

[#1]   </ReplicaLunKeyList>

 

 

 Here we see the test fail over only presents 2 LUNs

 [2009-03-30 12:37:13.500 ‘SecondarySanProvider’ 868 info] testFailover exited with exit code 0

 [2009-03-30 12:37:13.500 ‘SecondarySanProvider’ 868 trivia] ‘testFailover’ returned <?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>

[#1] <Response>

[#1]     <ReturnCode>0</ReturnCode>

[#1]     <InitiatorGroupList>

[#1]         <InitiatorGroup id=”0″>

[#1]             <Initiator type=”ISCSI” id=”iqn.1998-01.com.vmware:esxdr-5a1f63a8″/>

[#1]         </InitiatorGroup>

[#1]         <InitiatorGroup id=”iScsi-fc-all”>

[#1]             <Initiator type=”iscsi” id=”iqn.1998-01.com.vmware:esxdr-5a1f63a8″/>

[#1]         </InitiatorGroup>

[#1]     </InitiatorGroupList>

[#1]     <ReplicaLunList>

[#1]         <ReplicaLun key=”fs45_T1_LUN2_CKM00085000953_0000_fs42_T2_LUN2_CK200082200976_0000″>

[#1]             <Number initiatorGroupId=”iScsi-fc-all”>128</Number>

[#1]         </ReplicaLun>

[#1]         <ReplicaLun key=”fs47_T1_LUN3_CKM00085000953_0000_fs46_T2_LUN3_CK200082200976_0000″>

[#1]             <Number initiatorGroupId=”iScsi-fc-all”>129</Number>

[#1]         </ReplicaLun>

[#1]     </ReplicaLunList>

[#1] </Response>

 

  Here we see SRM log an error about the failure (dr.san.fault.LunFailoverFailed)

 

[2009-03-30 12:39:59.484 ‘SecondarySanProvider’ 868 warning] Failed to prepare shadow vm for recovery: Unexpected MethodFault (dr.san.fault.RecoveredDatastoreNotFound) {

[#1]    dynamicType = <unset>,

[#1]    datastore = (dr.vimext.SanProviderDatastoreLocator) {

[#1]       dynamicType = <unset>,

[#1]       primaryUrl = “sanfs://vmfs_uuid:49b58a75-48b0e5f8-e7c8-00151777f2cc/”,

[#1]    },

[#1]    reason = (dr.san.fault.LunFailoverFailed) {

[#1]       dynamicType = <unset>,

[#1]       key = “fs43_T1_LUN1_CKM00085000953_0000_fs41_T2_LUN1_CK200082200976_0000”,

 

 

This problem was actually caused by the initial replication task not completing successfully, after a tone of troubleshooting the EMC Celerra support team suspected memory corruption on the DM and once rebooted the replication task completed its initial  “FULL COPY” and subsequent SRM tests completed successfully with all 3 luns being presented to the DR ESX host. 

 

As noted above I think the “Prepare Storage” recovery step should have warned one of the Luns failed to snapshot rather than fail while trying to power on VM’s.

 

If you have any thoughts on this, let me know.