<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>InsaneGeeks World &#187; EMC</title>
	<atom:link href="http://blog.insanegeeks.com/archives/category/emc/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.insanegeeks.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Tue, 18 May 2010 19:02:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>DMX Virtual Provisioning, like everything else has good + bad parts</title>
		<link>http://blog.insanegeeks.com/archives/39</link>
		<comments>http://blog.insanegeeks.com/archives/39#comments</comments>
		<pubDate>Mon, 22 Mar 2010 05:16:56 +0000</pubDate>
		<dc:creator>InsaneGeek</dc:creator>
				<category><![CDATA[DMX]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Virtual Provisioning]]></category>
		<category><![CDATA[WLA]]></category>

		<guid isPermaLink="false">http://blog.insanegeeks.com/?p=39</guid>
		<description><![CDATA[I really like DMX VP (aka thin provisioning), it&#8217;s great in the automatic wide striping, overcommit, etc but at the same time there are a number of things you need to be aware of that will annoy you.   VP is great in that I don&#8217;t really need to think  (care) as much anymore, those iops are wonderfully spread ]]></description>
			<content:encoded><![CDATA[<p>I really like DMX VP (aka thin provisioning), it&#8217;s great in the automatic wide striping, overcommit, etc but at the same time there are a number of things you need to be aware of that will annoy you.   VP is great in that I don&#8217;t really need to think  (care) as much anymore, those iops are wonderfully spread across the backend&#8230; until I do and then ouch.</p>
<p>On the DMX you lose a resonable chunk of things you might be used to, optimizer is surely one of my old friends.  I&#8217;ve been strugglilng a bit with  performance, not because VP is slow but tracking things back to a source is such a massive pain in the ass.  WLA don&#8217;t seem to know much about VP, so you now have an added layer of abstraction you have to find.  SO if I&#8217;m beating a 48x of my SATA drives in a pool into submission at 100 wonderfully even spread iops per spindle&#8230; finding the guy who is doing it becomes a click nightmare.   I&#8217;ve got close to 1500 tdevs attatched to that pool and I&#8217;ve got to go look at every single one of them (or at least select all of them) and they don&#8217;t all have to be nice contiguous numbers, some tdevs might belong to another pool so you&#8217;ve got to exclude them.  Additionally there is no way to tell which one is nicely spread across all the drives in pool and which one has started doing random iops across concentrated to a subset of the pool&#8230; because someone had written out data when the pool was at 48 spindles rather than the 100+ it is now.   So you see a subset of disks get hot in a pool, so who&#8217;s doing it?  You have to map the 1500 tdev&#8217;s to the physicals and try to look for a peak/valley at the same time.   I&#8217;m pretty sure I&#8217;ve got a &#8220;fix&#8221; for the unbalance in the new microcode (unfortunately not the who part)&#8230;<span id="more-39"></span>but I&#8217;ll have to wait to make and let it cook for a while to see if my suposition will work.  The new DMX code that came out in Feb, now has the same option as the vmax to drain datadevs, but unlike the vmax the new dmx code doesn&#8217;t (to my knowledge) rebalance the pool.  But if I can drain a datadev and it redistributes the data across the whole pool; I am sure I can create a script to walk through the pool one datadev a time removing it then adding it back in before going to the next.  This should redistribute things resonably well&#8230; if you wanted to get adventurous you could had a group of new datadevs (10-30 or more) at the beginning then remove them at the end so that there isn&#8217;t one datadev that is really unbalanced compared to the rest.</p>
<p>What I&#8217;ve found as the best way to try and find performance info is to do things in two separate methods:  use symstat to get realtime data and then combine that with the WLA  trailing perf data.   I do some symdev &amp; symcfg magic foo to just get the tdevs bound to a paticular pool throw it into a sym device group and then when I no of something going on I run symstat against it.  Unlike WLA when you do that it will combine all the metamember data points into the meta head.  I can normally then pick out the offender relatively easily&#8230; but unfortunately not everything happens while you are watching so then its back to the click, click, click of WLA selecting tdevs unless your lucky enough to have an existing automation report around and you haven&#8217;t changed the tdevs against the pool (which for me hardly ever happens).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.insanegeeks.com/archives/39/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VMWare + Java + overcommitment= sad&#8230; + SSD = ?</title>
		<link>http://blog.insanegeeks.com/archives/34</link>
		<comments>http://blog.insanegeeks.com/archives/34#comments</comments>
		<pubDate>Mon, 22 Mar 2010 01:29:21 +0000</pubDate>
		<dc:creator>InsaneGeek</dc:creator>
				<category><![CDATA[DMX]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[SSD]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[VMWare]]></category>
		<category><![CDATA[Virtualization]]></category>

		<guid isPermaLink="false">http://blog.insanegeeks.com/?p=34</guid>
		<description><![CDATA[We are doing a overall of our core customer systems&#8230; unfortunately the developers have no idea as to what they are actually going to need.  They have us configuring bunches and bunches of VM guests with 8GB of ram in them to run Java.  Our VMWare guys dutifully looked at vmware memory commit rates on ]]></description>
			<content:encoded><![CDATA[<p>We are doing a overall of our core customer systems&#8230; unfortunately the developers have no idea as to what they are actually going to need.  They have us configuring bunches and bunches of VM guests with 8GB of ram in them to run Java.  Our VMWare guys dutifully looked at vmware memory commit rates on the existing and a sample of the new guests and said oh yeah we&#8217;ve got lots of space and let them do it.  They deployed a whole bunch of them over a week and turned them all over to devel to use.  While that was obviously bad it wasn&#8217;t because they were being complete idiots, we already purchased enough to double our vmware farm but had not received licenses yet.  So they did a quick look around and VMWare hosts said they had active memory free, devel was wanting their new environment up before the new hardware was ready to try and get ahead of things and the perfect storm happened.<span id="more-34"></span></p>
<p>I&#8217;ve been rummaging in the storage closet mainly here lately, trying to find the best way to do a large Oracle XMLDB installation&#8230; like 20TB in the next few months and growing past that very quickly there after.  So while I&#8217;m the only VCP on premise that&#8217;s not been my cup of tea lately&#8230;   So I get a call right before Christmas that our new devel environment is having network problems, seems to be available than not; once people start looking at it the problem goes away.  So stepping into the unknown&#8230; I start digging poking and prodding, tcpdumps, network traces, switch diags, etc and after turning over the right stone I find out we had overcommitted memory on lots of our hosts to the tune of 60-80GB.  However VirtualCenter is reporting that the host memory is only consuming 16-20GB of a 32GB&#8230;  So we have a wonderful example of a guests allocating memory to their apps but not really using it.  Java is pretty much universally thought of as offender #1 with this.  Java has a wonderful/horrible memory managment (depends upon who you talk to), in that you give it a chunk of ram and it manages it.  VMWare has a balloon driver which underload opens an app that consumes a chunk of ram having the OS push unused ram to disk and then kills itself freeing up the memory.  The problem with java is that it is managing it&#8217;s own memory so the OS has no idea what to swap out.  The OS trys to be intelligent with pushing out parts untouched for awhile; but garbage collection kicks in which has to touch all parts of java memroy and you are now swapping in which is the bad part as memory speed is == disk speed.  </p>
<p>In the end we set the memory reservation in all the java guests = to the size of ram; which prevented VMWare from swapping out it memory but mean that if you had a 8GB guest it would require 8GB of ram no matter if the guest was using 1MB or 8GB.  This was great for development as their problems went away but the business is now trying to figure out how to deal with the new unexpected IT costs to support their systems.   Even worse development is behind so while they say yes they can skinny up their RAM requirements they aren&#8217;t ready to do it as they are squishing bugs still and don&#8217;t want to introduce more variance.</p>
<p>This morning I had an &#8220;aha&#8221; moment, VMWare has a hardly touched option as to where ESX should swap a guest to, I&#8217;ve got a few GB of unused SSD drives in my DMX4; what if we were to allocate a chunk of it to ESX swap space.   While it will continue to be much slower than actual ram, the rotation latency should be removed; hopefully making it not nearly as bad as it was (was doing this on raid-6 SATA pumping 100+ iops per drive before so it was really bad). </p>
<p>I don&#8217;t have enough to configure a swap partition in hundreds of guests, so this would have to be when ESX itself pages.  I&#8217;m thinking that I&#8217;ll probably need to disable swap inside the guest OS as the balloon driver would kick in pushing it to the guests existing drives before ESX will swap it out using the SSD swap location&#8230;. but I don&#8217;t know exactly how bad things have to get for the other guests before ESX will swap vs ballooning guests.</p>
<p>I&#8217;ll see if people are interested in trying this at work or not&#8230; will let you know on any details that come out of it.  This should only be thought of as a short-term solution as having 60+GB of memory overcommitted is not going to be good anyway you think about it.</p>
<p>I&#8217;ve found one article this evening but it doesn&#8217;t really go into the whens/where&#8217;s more just on the performance benchmark (which did validate my general suposition):  <a href="http://communities.vmware.com/blogs/chethank/2009/12/22/using-solidstate-drives-to-improve-performance-of-sql-databases-on-vsphere-hosts-when-memory-is-overcommitted">http://communities.vmware.com/blogs/chethank/2009/12/22/using-solidstate-drives-to-improve-performance-of-sql-databases-on-vsphere-hosts-when-memory-is-overcommitted</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.insanegeeks.com/archives/34/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Management software how I hate thee!!!</title>
		<link>http://blog.insanegeeks.com/archives/21</link>
		<comments>http://blog.insanegeeks.com/archives/21#comments</comments>
		<pubDate>Sat, 20 Feb 2010 18:05:36 +0000</pubDate>
		<dc:creator>InsaneGeek</dc:creator>
				<category><![CDATA[DFM]]></category>
		<category><![CDATA[ECC]]></category>
		<category><![CDATA[EMC]]></category>
		<category><![CDATA[NetApp]]></category>

		<guid isPermaLink="false">http://blog.insanegeeks.com/?p=21</guid>
		<description><![CDATA[InsaneGeek is grumpy at storage software management tools]]></description>
			<content:encoded><![CDATA[<p>Management software&#8230; it offers so much but fails me in the end; I think that the reason there is so much derision for the software is that we as admins can see how much it gives us.  It&#8217;s tantalizingly close to making our lives easier but the last bit are forever out of reach mocking us for having expectations of things working, actually making us angrier than having no software at all.</p>
<p><span id="more-21"></span>ECC you know it&#8230; you might have tangled with the beast during an upgrade that failed and you re-installed from scratch.  I&#8217;ve used ECC for years and years (I just found some old 4.3.1 media)&#8230; and I feel like tattooing &#8220;love and hate&#8221; onto my fingers.</p>
<p>I love that it gives a wonder view of everything from host to switch to storage.  I like to zoning in it keeping all the names consistent, I love that I can give reports to management on everything.  I love that everything can be inside ECC (EMC wise&#8230; other mfg not so much).  I love that I can get some performance information out of it, I love being able to visually see a fabric.  The biggest love is that it keeps me from making mistakes: typos, etc don&#8217;t happen.  I never have to type in a WWN, a zone name or even a host name.  Everything is named properly I never have someone type in hostb when it should have been hosta.</p>
<p>I hate that unless there is an agent running it lies to me.  Present some storage within ECC to a host without an agent&#8230; do relationship from the lun in the array to a host and it will tell you *nothing*.  Obviously the software does know, if you look at masking to the host it lists the lun right there.  Information I can&#8217;t trust is *worse* then no information, ECC goes 90% of the way there; but it fails so spectacularly bad on the last bit.  I want to clean up some storage laying around that someone didn&#8217;t finish removing completely, or used for a quick test, etc &#8230;  Finding it is so painful, if I can&#8217;t trust the tools in all situations I can&#8217;t use them period for that task.   I&#8217;ve put in multiple RFE over the years&#8230; FIX THIS.  Storagescope knows it, masking agent knows it, why doesn&#8217;t relationship know it!  Even worse is that you can get old stale luns if the host agent isn&#8217;t running.  Stop the host agent add/remove some luns and does it give you the correct data, no it tells you the data it last got from the host when the agent was running.  That is simply bad, comeon check a copy of the vcm database before you tell me something untrue that I might take an action on.</p>
<p>Again ECC and it&#8217;s 90% of the way, I can add everything but a fabric tape drive manually.  Host, Switch, Storage Array all manual, a tape drive know only the the stupid horrid SMI-S.  It knows it&#8217;s there, if finds it on the fabric, but I can&#8217;t categorize it properly.  I end up marking it as HDS or HP storage so zoning rules work  and but then it screws up reports as it&#8217;s now an array.  Don&#8217;t do automatic discovery only (which is what I hear rumblings of), nor do manual adds either; do *both*.</p>
<p>Lastly make ECC faster, everybody is running from ECC because it&#8217;s simply slowing down.  In the day it wasn&#8217;t too bad, but now?  Dear god I want to put my fist through the monitor.  I use to have it running on a quad core box that had 16GB of ram on it (it was given to me when some other project was cancelled).  It was still slow, under 32bit windows, so I talked them into getting me 64bit windows so I could use more of the memory&#8230;  not much different.  I&#8217;ve got Oracle flowing out of my ears here, why can&#8217;t I use my own Oracle server and make it as big as I want like I can for the STS database? Sure provide a database with the software for people who don&#8217;t have one, but for those of us who do let us USE it.  This is switching to my biggest beef with ECC, speed it up.  I am almost completely switching over SMC because of this.  The java console is a massive PIG, screen updates  takes forever.</p>
<p>Lest you think I think only EMC software is bad&#8230; having tussled with NetApp&#8217;s DFM line as well&#8230; it&#8217;s even worse. Open it up&#8230; dear god it&#8217;s a complete mess, if I want to get actual useful data out of it I am using the command line &#8220;dfm report&#8221; and then making it useful.  I loath PerformaneManager in DFM, it&#8217;s layout is horrid and trying to get useful data out of it is an exercise in futility.  During a rather large fit at a lease end we were evaluating performance data, it annoyed me so much that I ended up doing, logging into every day, clicking every single metric and exporting it to a csv file, which I then wrote some perl scripts to pull it into a mysql database that I then queried using a php to create graphs with jpgraph.  Like ECC it has been a massive bear for upgrades, most of the time the guys just throw it out and do a recomplete reinstall so all the trend data is lost.</p>
<p>Other software is just as bad (Openview, etc)&#8230; I find them all kinda useful in some way and then they fail me and most usually on the parts that I need help doing.  I don&#8217;t really need help doing the easy things, or wiz bang features that probably only get used by 1% of users, help me do the hard things that a lot of admins don&#8217;t do i.e. storage cleanup only happens when a new frame comes in.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.insanegeeks.com/archives/21/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

