27 July 2012
UCMA Startup errors - when everything else doesn’t work, check the hosts file
This was a fun round of troubleshooting. One of our developers needed to debug a UCMA application that we’ve run on dozens of other servers. He went through the steps to provision the app, just as we had everywhere else, but we got the following exception from starting the platform:
Portal failed establishing the endpoint: Microsoft.Rtc.Signaling.ConnectionFailureException:Operation failed because the network connection was not available. ---> Microsoft.Rtc.Internal.Sip.SipException: Invalid From header: Semantic error: fTopLabel == true at Microsoft.Rtc.Internal.Sip.FromHeader.Parse(SipHeaderLink& headerLink) at Microsoft.Rtc.Internal.Sip.FromHeader..ctor(String headerValue) at Microsoft.Rtc.Internal.Sip.NegotiateLogic.CreateABlankNegotiate(FunctionType funcType, String negotiateData, SipResponse prevResponse) at Microsoft.Rtc.Internal.Sip.NegotiateLogic.StartCompression() at Microsoft.Rtc.Internal.Sip.NegotiateLogic.AdvanceOutboundNegotiation() at Microsoft.Rtc.Internal.Sip.TlsTransport.DelegateNegotiation(TransportsDataBuffer receivedData) at Microsoft.Rtc.Internal.Sip.TlsTransport.OnReceived(Object data)
At first glance, this looks like a network issue, so we made sure that the dev machine could reach the Lync server on all the ports it needed (it could). Then we rechecked the certificate, and verified that the MTLS connection was forming, but then immediately terminating. We even tried changing the client machine name to something without hyphens, and re-provisioning the application a couple of times just to make sure that there wasn’t something wrong along the way. Finally, running OCSLogger.exe on the developer machine and digging through the traces, we saw this:
(000000000270F9F2)local cert SN robinlaptop.corp.computer-talk.com is not same as localfqdn localhost127.0.0.1. Send feature info
Then we checked the hosts file, and noticed a line that looked like this:
127.0.0.1 localhost127.0.0.1 localhost127.0.0.1 localhost127.0.0.1 localhost127.0.0.1 localhost127.0.0.1 localhost
Now, I have no idea how this developer got this in his hosts file, but removing the entry fixed everything. After discovering this, I tried adding one line for localhost in the hosts file, and everything was fine. I took the same entry and split it to four lines-everything was fine. I tried changing ONE CHARACTER in the bad entry-everything was fine. For some reason, only that exact sequence of characters managed to short-circuit the platform startup.
In any case, I’m putting this out there as a troubleshooting suggestion for anyone else who runs into the same thing. Make sure that nothing has messed with your hosts file-it’s not necessarily the first thing you’d think to check, but I know I’m adding it to my list of troubleshooting steps from now on.
Also, if anyone knows exactly why this particular entry in the hosts file does what I described here, I’d really like to know why.