Recently I setup VMware’s SRM (Site Recovery Manager) in preperation for work coming up at a customers site using our existing 2 node ESX cluster and 2x EMC NS20’s (Celerra’s).
I setup a File System about 35GB then created a test iSCSI lun of 25GB, presented it to my ESX servers and continued on my SRM journey.
After setting up an identical sized, read only iSCSI lun on the target Celerra I went on to configure replication using the wizard, next, next, next, next, all done… Man that was too easy. I looked at the status of my replication task and it showed a nice green tick with “OK”.. woo hoooo im good to go.
I then went on to configure SRM and after a few teething problems got everything up and running using a small linux virtual machine I have to store passwords, this vm has a small foot print using only 2GB of storage.
Now having set this up using a test lun and test VM it shouldnt be that hard to then apply to my production environment right ?…. WRONG !!!!
I went to setup replication of my production lun and kept getting an error “Version Set out of space”which confused me at the time. I had read and seen a few demo videos showing you needed to have a file system on the target end (which holds the read only lun) at least 20% bigger then the iSCSI lun so I kept adding more and more space to the target file system but unfortunately this made no difference.
Finally I thought to my self the only thing I haven’t tried is to increase the file system at the production side, *POOOOF with a puff of smoke* the replication task that was failing was now successful.
Now unlike file system replication (using SYSVOL) iSCSI replication uses the additional space in the file system hosting the lun to store a base line snapshot (which is equal in size to the amount of data used in the iSCSI lun apposed to the actual raw size of the iSCSI lun) as well as changed data which needs to be snapshot-ed and sent to the target/destination lun.
I spoke to EMC about this and received an unpublished document around iSCSI/Replication sizing which pretty much says for iSCSI luns that need to be replicated the file system (for non virtually provisioned file systems) needs to be 2.5x bigger then the iSCSI lun for both the source and target systems.
You maybe thinking to your self here, well how come your test lun and test replication task completed when the file system was not 2.5 times bigger then you iSCSI lun and the answer is because the baseline snapshot only took up 2GB (which is the size of the small virtual machine) apposed to my production lun that was actually using 250GB of a 300GB lun that was held inside a file system of 400GB, the error “Version Set out of space” translated to “You don’t have enough space in your file system to create a baseline snapshot”
I hope this helps anyone searching on the “Version set out of space” error, and for those of you who don’t have this error but ended up reading because it related to Celerra, don’t let this post put you off, The VMware / Celerra / SRM mix works really really well and the more I play with the Celerras the more I like them.