Your basic ITPro blog... What's going on at work, what I'm interested in.

Saturday, March 28, 2009

Multi-Homed Domain Controller = FAIL!

Apparently, this is not a good idea…

I am just finishing my crawl walk down a long and winding path. It all started when we began having problems authenticating our wireless clients against our IAS server. We have a DC running IAS. This DC also runs an app for our VoIP phones. As such, this DC has two NICs, one on our DATA VLAN and one on our VOICE VLAN.

The IAS authentication problem would show up sporadically. Using WireShark, we would see authentication requests coming from the WLC to our IAS box, but no responses going back out. Things would just ‘black hole’ at the IAS box. I ended up opening tickets with both Cisco and Microsoft on this problem. Until we found a solution, our only sure-fire way to fix things (for a time) was to reboot the IAS/DC server.

It didn’t take long to notice that the WLC was working as expected. So, we focused on the Microsoft side of the equation. To their credit, Microsoft stuck with us as we worked through this. We had this ticket open for a few weeks and ran through various levels of support and various engineers. It wasn’t until we got to “level 3” support at Microsoft that we found the problem. This engineer suspected something that no one (me included) thought to even check… Could requests be coming in on one NIC and going out the other? As they say… EUREKA!

Of course, the first thing we had to do was wait… because, you know, we couldn’t exactly trigger this problem, or time it, or predict it. It would just happen all of a sudden. But, the next time we saw the problem, I ran WireShark on both interfaces. Sure enough, requests were coming in on one NIC and going out the other. The WLC didn’t like that, not one bit.

So, we had found our problem. Unfortunately, fixing the problem isn’t as easy as disabling one of the NICs. I mean, that works in the short-term, but it is not a solution. The phone paging system uses the voice VLAN NIC, as do our phones. We had a couple of phones give fits trying to register with the CallMan last week. I had disabled the voice VLAN NIC. Re-enabling this brought my phones back up.

This particular issue was easily resolved by putting an IP Helper address on the voice VLAN on the router. Phones now get their DHCP responses from the data VLAN.

But, we still have to fix the paging app. It has to have a NIC on the voice VLAN, so it looks like we will be migrating this app to its own box… Probably a better solution anyway.

Moral of the story: multi-homed DCs can cause problems… Also, don’t try to do too much on your DCs (or any box, for that matter).

No comments:

Additional Info

My photo
email: support (AT) mangrumtech (DOT) com
mobile: 480-270-4332