esxtop advanced features

September 4, 2009 by erikzandboer

No rocket science here. esxtop has always been there. Yet a lot of people miss out on some of its great features. Hopefully this blogpost will get you interested in looking at esxtop (again?) in detail!

Yesterday I attended a very interesting breakout session about esxtop and its advanced features in vSphere. Old news you might say, but there is SO much you can do with esxtop. For example, you can export data from esxtop and import them in Windows perfmon. And if you did know that, then for example, did you know you can now actually see which physic NIC is being used by a certain VM?

Other neat little features were shown. The best one being that the “swcur” field is actually NOT about the current swapping activity of a VM, but swapping that occured in the past (yes, I too would have called it differently…). How many of you knew that one? Finally, a very interesting field in the storage screen (yes for those who did not know that one, esxtop is not just about CPU, but also memory, storage, and new in vSphere… Interrupts) ). This field is called “DAVG” and this shows the actual latency seen by ESX to your storage (and also KAVG for kernel latency and GAVG for the total latency the guest sees).

There were also a few examples of misbehaving VMs which was very interesting to see. Numbers which seemed not possible, yet explained perfectly. I would like to vote this very last presentation at VMworld 2009 the best technical presentation I witnessed there!

I hope I got you (re)interested in esxtop. I am more of a graphical guy, so I like the performance monitor embedded within the VI client. But some things just aren’t there. So esxtop is definitely worth a(nother) look. If you’re using ESXi, make sure to download the vMA appliance (here) which has resxtop included (which looks a lot like esxtop on ESX).

VMworld 2009: ICE sculpting pictures online!

September 3, 2009 by erikzandboer

During the VMworld 2009 party, one of the very amusing attractions was the ice sculpting by the ICE team:

The ICE team at work

The ICE team at work

 

The ICE team building their guitar

The ICE team building their guitar

You can see all the photographs here !

Vmworld 2009: Foreigner Concert Photos online

September 3, 2009 by erikzandboer

Hi all,

Took some photos at the VMworld 2009 party of the Foreigner Concert.

Foreigner overview

Foreigner overview

Foreigner!

Foreigner!

You can view all photos here: Foreigner at VMworld 2009

Just for Fun – VMware just got greener

September 3, 2009 by erikzandboer

So what do you get when you mix VMware ESX and some dirt, and then a you add a little enthousiasm? Exactly, you get a paludarium.

The word paludarium comes from the word “palus” basically meaning mud, and it is kind of a cross between an aquarium and a terrarium. I have been building my own little world inside this glass box for the past months. Very moist, very green. Being a VMware fan I just had to combine these two hobbies. Why? Well that one is obvious, because you *can*!

VM controlled paludarium

VM controlled paludarium

So now my tiny little jungle is fully controlled by a virtual machine. Lighting, rain, fog, even thunder! Just when everyone thought it wasn’t possible, VMware just got greener!

See my paludarium site at http://paluweb.nl for some live stats!

Long distance Vmotion a fact

September 3, 2009 by erikzandboer

Today was announced that long distance vmotion is now officially supported by VMware up to a distance of 200 kilometers. A team-up from Cisco, VMware and EMC did some tests, proving the posiblities. Long distance vmotion is basically the vmotioning between two remote datacenters, enabling follow the sun, follow the moon, or evacuating a datacenter anticipating on a soon-to-be-disaster (“the tornado is coming”).

Of course some limitations apply.  Things like maximum latency of 5ms round trip and minimum bandwidth of 622 Mbits/sec apply, but still! Long distance vmotion is a fact, and I guess will soon be accepted as an enterprise solution just like normal vmotion has.

esXpress uses vStorage API for detecting changed blocks

September 3, 2009 by erikzandboer

Today at VMworld 2009 is joined a breakout session presented by PHD Virtual about their latest version of esXpress (3.6). Great stuff once again! Apart from the fact that esXpress is now fully functional on vSphere (still no ESXi support though), they also managed to use the vStorage API for “changed block reporting”. Basically what this means, is that when you are using vSphere and doing delta or deduped backups, you no longer need to read all the blocks of a VM and then decide is that block was changed or not. PHD managed to get esXpress so far that it reads only the changed blocks directly by using this “cheat sheet” that VMware was so nice to make available though the vStorage API.

What this means is, that backup speeds will be way higher when you do delta or deduped backups.

When you also use their dedup targt, with the dedup action going on on the SOURCE, you get tremendous backup speeds and as an added bonus you can use smaller WAN links when you are sending these backups offsite. Wonderfull guys, you did it again!

VMware ThinApp becoming automagic!

September 3, 2009 by erikzandboer

Yesterday on VMworld 2009 I went to a breakout session on VMware ThinApp. To my surprise we saw a demo on a new ThinApp feature. This feature is basically, that you can automagically rebuild your ThinApps! In the demo five Windows XP VMs were used, and all “setup.exe” files resides on some share. When the repacker was kicked off, the VMs were snapshotted. Then, ThinApp got kicked off inside each VM. After the ThinApps were regenerated, they were automagically copied off the VM, after which the snapshot was reverted.  This process repeated itself on all available VMs until all ThinApps were rebuilt.

Magical!

VCP4 certified

September 3, 2009 by erikzandboer

I am not (yet?) the kind of blogger to throw on everything I see around me on my blog just because it is “new”; I think blogging should be more about things you have tested or measured.

Yet today on VMworld 2009 I make an exception: I just got my VMware VCP4 certification. Yeah!

The new esXpress 3.5

August 6, 2009 by erikzandboer

For a long time now I have been a fan of PHDs esXpress. It is still the only VMware backup solution I know that scales, has no single points of failure and works reliable with VMware snapshots. The solution has always been “other than others”: At first it appears to be a really weird piece of software, that creates its own appliances to perform its backups. Once you get to know it, esXpress’s way of working is great. So great in fact, that VMware themselves are now adopting this very way of working with their Disaster Recovery feature in vSphere 4, maybe even stepping away from their beloved VCB (VMware Consolidated Backup).

VCB in my opinion has never been that great, apart for some special uses in special environments. esXpress fits all, from single ESX hosts to large clusters. In contrast to VMware’s Disaster Recovery which is still buggy at the time of this blog, esXpress has been on this train for years now, and definitely knows the drill. EsXpress 3.1 is not the holy grail though. Some features were just not easy to use, there was no global GUI to manage all nodes easily and there was no Data deduplication available (not that I am that big a fan data dedup for backup, but ey, everbody does it!).

 

Enter esXpress 3.5

To make up for most of the shortcomings, esXpress version 3.5 has been introduced. The engine itself still is pretty much the same. And exactly there lies the power of esXpress: It still WORKS. It just works, it always works. Extra features have been added in such a smart and incredible simple way, that the product remains rock stable. No “waiting for the point 1 release” needed here!

I was over at a client who suffered a SAN failure (when upgrading firmware). They were in progress of failing over to their recovery site, when the administrator got an email from one of the production ESX hosts: esXpress had successfully completed its backups. What? All LUNs appeared unavailable at the production site. This host did not have its storage devices rescanned; it still kept on ticking. I think things like this are major plusses for both VMware ESX and esXpress showing their enterprise readiness.

 

Finally: A working global GUI

From the initial ESX 3.5 release, PHD also released a GUI to manage all esXpress instances from one central portal. In the old 3.1 (and before) days, you ended up copying configfiles between hosts; working, but not very user friendly. You might think that adding a central GUI took a lot of deep digging in the code of esXpress. But, they surprised once again: The GUI just holds the config files and, could it be more simple, the GUI appliance introduces a small NFS store. The NFS store is automagically mounted to the ESX servers, and presto! That is where the config files can be found. EsXpress itself just has to check the share for a new config, something already (partly) in existance in the previous version.

Even better: the GUI does a great job. I had some trouble with the first versions, some manual labor was needed to get it going (like manually needing to change the time zone and not being able to add a second DNS server). All these issues are fixed now, but even those early versions were already very effective. And things have become only better since then!

 

Because “everybody has it”: Deduplication

What should we do without deduplication nowadays? It is a major hype around storage and backup. If you don’t have it, you’re out of business it seems. But who ever thinks about the risks and limitations involved (see: The Dedup Dillema).

 The idea of deduplication is brilliant, but the implementation has to be right. I must admit, I am not a big fan of deduplication. It is still your vital data you are talking about! Admit nr.2: EsXpress 3.5 managed to change my opinion to dedup a little.

The deduplication implementation of esXpress is in style with PHDs way of working: both effective and simple. A separate appliance is installed (which is in fact the same one as the GUI appliance. At first boot of the appliance you choose what the appliance will become. Smart!). The dedup appliance (called PHDD for PHd Data Dedup) can mount a datastore or an NFS store for storing its deduped data. It performs quite well, saving diskspace as you backup more of the same (or alike) data. It is now much “cheaper” to keep more backups of your VMs.

Only few changes appear to have been made to esXpress itself to allow PHDD as a backup target, so once again, stability guaranteed. 

So now all your data lives inside the PHDD appliance. Now how do I get out this data the way I want it? PHD did something clever: They added a CIFS/SAMBA interface to the appliance, allowing you to browse, copy and backup your VMs as if they weren’t deduped at all! This last feature makes the mix of backup and dedup more acceptable, even effectively useable :)

 

When will the fun EVER stop? File level restore!

The best feature of the PHDD dedup target in my opinion, next to dedup itself, is the ability to perform file level restores. At last you can get out that one single file of a full VM without having to restore the whole thing. This option is so cool, you simply browse to the appliance, select your files, and save the collection you marked as a single zip file! Couldn’t be easier, another bulls eye for PHD, even in their first release of this piece of software.

 

Scaling esXpress 3.5  with dedup 

Not all is bright and shiny with dedup. I found it hard to scale the solution: If there is only one PHDD target, scaling ends somewhere, and a SPOF (single point of failure) is introduced. Not good (although PHD is working on a way to link the dedup appliance to a secondary one). Still, one may consider to use two or more PHDD appliances in parallel. This will work, but the dedup effectiveness will drop sharply, especially when you use DRS and all VM backups end up on all PHDD targets in time (this happens when you design the often used strategy where one assings for a backup target to each ESX server individually with failovers to others). You can make it somewhat more effective by specifying a backup target for each VM (in the local config), a best practice that also stands when using multiple FTP targets btw. This will ensure that a backup of a particular VM will always end up on the same backup target, making things clearer and making dedup more effective (although far from ideal – Every PHDD target has its own library of data, meaning that identical blocks still get stored on EACH PHDD target instead of just one).

The limitations mentioned above are not a limit of esXpress though, but more a limitation of dedup in itself. PHD choose to use online dedup (basically you dedup while you write), which will use CPU power during backup and restores. CPU power might even be the limiting factor in your backup speed. Luckily CPU power is usually available in abundance nowadays. I will dive deeper into performance and scaling of deduped installations in the next blogpost, which will hopefully prove that dedup really performs (like the setup using multiple FTP targets simultaneously described in my blogpost Scaling VMware hot-backups using esXpress).

 

Conclusion

The new version of esXpress 3.5 is in terms of speed and reliability on par with its predecessor version 3.1. It is still the only backup solution I know that has no Single Point of Failure, scales (REALLY scales) up to whatever size you want without any issues, and best of all: Once it works it KEEPS working with hardly any problems around VM snapshotting like some other backup solutions do have.

On top of all the good things that already were, a global GUI is added which manages all esXpress installs at the same time, and there is a Data Deduplication appliance which features a very well working single file restore option. I would like to have seen a file restore option in a non-dedup target as well. From what I’ve seen, online deduping costs a lot of CPU power, and the backup speeds go down because of this. Once the database is built though, things do get better (less data to backup because more and more blocks are already backup up in th dedup appliance). Still, calculations have to be done.

In a smaller environment, the dedup appliance is no match for a set of non-dedupping FTP targets. This is a drawback from which any dedup system suffers… It is just the way the “thingy” works. Still I see a solid future for esXpresses PHDD dedup targets where speed is not of the utmost importance.

Make no mistake on backup speeds: IF esXpress and its backup targets are designed and configured properly, it is by far the fastest full-VM backup solution I’ve seen. It does not mess with taking backups through the service console network, it creates Virtual Appliances runtime that perform the backups – and many in parallel. If you want to see real backup speed from esXpress, do not test it on a single VM like some people tend to do when comparing. If you do, speeds are about on par with other 3rd party vendors. But when scaled up to make 8 or more backups in parallel to several backup targets with matched bandwidth, esXpress will start to shine and leave the competition far behind.

The Dedup Dilemma

June 9, 2009 by erikzandboer

Everybody does it – and if you don’t, you can’t play along. What am I talking about? Data deduplication. It’s the best thing since sliced bread I hear people say. Sure it saves you a lot of disk space. But is it really all that brilliant in all scenarios?

 

The theory behind Data Deduplication

The idea is truly brilliant – You store blocks of data in a storage solution, and you create a hash which identifies the data inside the block uniquely. Every time you need to backup a block, you check (using the hash) if you already have the block in storage. If you do, just write a pointer to the data. Only if you have not got the block yet, copy it and include it into the storage dedup Dbase. The advantage is clear: The more equal data you store, the more you save in disk space. This is, especially in VMware, using equal VMs from templates a very big saving in disk space.

 
The actual dilemma

Certainly a nice thing about deduplication is, next to the large amounts of storage (and associated costs) you save, is that when you deduplicate at the source, you end up only sending new blocks across the line, which could dramatically reduce the bandwidth you need between remote offices and central backup locations. Deduplication at the source also means, you generally spread CPU loads better across your remote servers instead of locally in the storage solution.

Since there is a downside on every upside – Data Deduplication certainly has its downsides. For example, if I had 100 VMs, all from the same template, there surely are blocks that occur in each and every one of them. If that particular block gets corrupted… Indeed! You loose ALL your data. Continuing to scare you, if the hash algorithm you use is insufficient, two different data blocks might be identified as being equal, resulting in corrupted data. Make no mistake, the only way you can be 100% percent sure the blocks are equal, you need a hash number as big as the block itself (rendering the solution kind of useless). All dedup vendors use shorter hashes (I wonder why ;) ), and live with the risk (which is VERY small in practice but never zero). Third mayor drawback, is the speed at which the storage device is able to deliver your data (un-deduplicated) back to you (which especially hurts on backup targets which have to perform massive restore operations). Final drawback: You need your ENTIRE database in order to perform any restore (at least you cannot be sure which blocks are going to be required to restore a particular set of data).

 
So – should I use it?

The reasons stated above always kept me a skeptic when it came to data deduplication, especially for backup purposes. Because at the end of the day, you want your backups to be functional, and not requiring the ENTIRE dataset in order to perform a restore. Speed can also be a factor, especially when you rely on restores from the dedup solution in a case of disaster recovery.

Still, there are definitely uses for deduplication. Most vendors have solved most issues with success, for example being able to access un-deduplicated data directly from the storage solution (enabling separate backups to tape etc). I have been looking at the new version of esXpress with their PHDD dedup targets, and I must say it is a very elegant solution (on which I will create a blog shortly :)