Your basic ITPro blog... What's going on at work, what I'm interested in.

Sunday, June 21, 2009

Perspective

I invest a lot of time (measured in actual minutes and hours) on computers. My job is in IT, managing dozens of Windows servers, dozens of Dell and Cisco switches/routers/WAPs/etc., over a hundred Cisco IP phones and their users, multiple software packages and all the other ‘trimming’s that come with a typical SMB systems installation. I spend many more hours reading and learning about technology, trying to keep up on trends, learn about what’s on the horizon, develop my skills on solutions we have in place. Much of my free time is spent on the computer, playing games, watching TED Talks, Stumbling, etc. All this to say, I’m no different than most of you, I am guessing…

I spend a lot of time on computers.

But, today is Father’s Day. For me, this is a day of perspective. Because, when I look into the eyes of my two sons, when my 5 year old runs up to me and gives me the longest hug I’ve had in a long time and tells me, “I’m so glad you are my father.”, well I am reminded of what is really important.

I just want to say to all you fathers out there, Happy Father’s Day. I hope and pray that this is a day of joy and happiness for you.

Thursday, June 18, 2009

BAD_ADDRESS = bad!

I was working to deploy some new IP phones on our Gilbert campus, and kept getting DHCP address assignment errors. The phones would sit there ‘configuring IP’… Just sitting there. Meanwhile, my DHCP scopes was filling up with leases to “BAD_ADDRESS”. Do a web search for “DHCP BAD_ADDRESS” and you will get a good idea of the problem.

While some reported this problem being associated with Mac clients or other IPv6 clients on the network, this was not my problem at all. My problem was simple duplicate IP addresses on the network. The tough part of this was that there were no DNS entries for the offending IP address and no valid DHCP leases for these IPs. Yet, I was able to ping the addresses, so something out there was using these addresses.

I tried using ping/arp to find the devices on the network, but did not have any success until a network engineer I was talking to suggested that go to my core router/switch on the network and do my ARP lookups on that device. I had been doing them from my workstation and a couple of edge switches. This was the key and I had struck gold. My core switch (managing all of my VLANs) had all of these IP/MAC entries in its ARP table.

From there, I was able to find the actual devices that has these BAD_ADDRESSes. This exposed the root problem that turned out to be an interesting residual from a previous issue I had worked on. It turns out that there were a number of phones on my network that were still configured to use the now-defunct IP address from our old multi-homed configuration. So, essentially, their DHCP server no longer existed. Thus, they had little choice but to hold on to their assigned IP address for dear life, hoping and praying that, someday, their long-lost DHCP server would return. Little did they know that the server was sitting right next to them, just with a new IP address. I quickly generated a list of these devices and rebooted them. They immediately found the DHCP server and got an IP address.

But, back to the BAD_ADDRESS issue… My DHCP scope had no record (no active leases) for these residual IP addresses being held by these orphaned devices. So, when I plugged a new phone in, my DHCP server was more than happy to attempt to hand those IP address out. From what I have gathered, the basic steps in DHCP go something like this (super-simplified and possibly not even right):

  • Client makes request
  • Server pulls an unused address from the appropriate scope
  • Server responds to client with this IP address and associated network configuration
  • Client verifies that IP address is actually available (not currently on the network)
    • SUCCESS! Client keeps the network configuration and is happily on the network
    • FAILURE! Client reports back to DHCP server that IP is already in use
      • DHCP adds entry in its DHCP lease DB for this IP address, assigning it to ‘BAD_ADDRESS’
      • Start process over with next available IP address

Once all devices were talking to the correct DHCP server, this problem simply went away. My new phones were immediately configured and working.

Wednesday, June 17, 2009

File Store saga

So, we had an issue with a Dynamic disk in a VM. This disk would not be active after a VM restart. I had to manually reactivate it. In doing this, my shares and ABE settings were lost and had to be reset.

After some research, I found that there were two options to fix this that did not include backup-rebuild-restore. These options are:

  • Attach disk to IDE rather than SCSI.
  • In Registry, change the 'START' value in “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\storvsc” from 3 to 0.

The first option seemed the ‘better’ choice, as it is just using the tools/software, rather than reg-hacking. So, that is the option I went with first. But, our disk size ended up causing problems with this. We experienced some file corruption (thank God for backups) and ended up having to move the disk back to SCSI. So, it was off to Plan B…

… which worked perfectly. After changing the registry value and doing a couple of test reboots, everything looked good and stable. Then, we just had to ‘clean up’ our corrupted files. Users a re still trickling in with files that can’t be opened. But, a quick restore from our pre-problem backups is fixing things in most cases.

WHAT I LEARNED:

  1. Verify and Clarify! Do your research and develop a plan. Then, verify that plan, not just the steps/technologies/ideas, but the actual plan! Run through it one more time. Get one more pair of eyes on it. Verify that the actual steps you are planning on taking are solid. My discussions on this topic lead me to believe that moving from SCSI to IDE was the best approach, but I didn’t run my actual plan by other engineers. I am confident my flaw would have been caught had I done so.
  2. Take precautions! I could have/should have taken extra precautions before executing my plan. I had recent backups, but not up-to-the-minute backups. Should have done that. Is it too much to ‘expect’ failure and prepare accordingly? Maybe not…
  3. Be thankful for the Grace of God found in His people! My co-workers were/are awesome! I am humbled and grateful for their understanding and grace during this ordeal.
  4. Don’t rush. I has anxious to get this fixed. And, because of that, I rushed things. Oh, I didn’t feel like I was rushing things at the time. But looking back (isn’t hindsight great?!) I see now that I should have taken more time to contemplate this issue. Overconfidence? Perhaps…

Also, as a result of this, we made some changes to our DR plans… Specifically, we increased the frequency of our file store backups… from once a day to every six hours.

Wednesday, June 3, 2009

Dynamic Disks and Hyper-V VMs… Not So Much!

As per here, disks in Hyper-V VMs should be Basic (not Dynamic), or they will start up as inactive when you boot the VM. So, you have to reactivate the disk every time you reboot the VM. This also causes any shares on the volume(s) to disappear.

Don’t ask me how I know this…

Also don’t ask me if I am going to enjoy performing the ‘fix’ on a disk with over 2TB of data on it…

<weep>

BackupExec… Oh How I Hate Thee!

I just have to say it out loud. This has got to be the worst software ever conceived of by man. Why is it, when I create and the start a job, it just sits there for (seemingly) EVER?! I created a restore job and, after determining that I wanted to change the job, I canceled it, made changes, and started it again… NOPE! Instead, it just sits there… going on an hour now!! No alerts waiting for a response… just sitting there.

I really hate this software!

I could rant about so many things about BackupExec (now trying version 12.5) that I hate… D2D performance being very poor, jobs constantly getting stuck, etc. etc.

The question is, where can I go?

Additional Info

My photo
email: support (AT) mangrumtech (DOT) com
mobile: 480-270-4332