27 Juillet 2012

UCMA Startup errors - when everything else doesn’t work, check the hosts file

Written by Chris Bardon, Posted in Troubleshooting, Lync, Microsoft

This was a fun round of troubleshooting. One of our developers needed to debug a UCMA application that we’ve run on dozens of other servers. He went through the steps to provision the app, just as we had everywhere else, but we got the following exception from starting the platform:

Portal failed establishing the endpoint: Microsoft.Rtc.Signaling.ConnectionFailureException:Operation failed because the network connection was not available. ---> Microsoft.Rtc.Internal.Sip.SipException: Invalid From header: Semantic error:  fTopLabel == true
   at Microsoft.Rtc.Internal.Sip.FromHeader.Parse(SipHeaderLink& headerLink)
   at Microsoft.Rtc.Internal.Sip.FromHeader..ctor(String headerValue)
   at Microsoft.Rtc.Internal.Sip.NegotiateLogic.CreateABlankNegotiate(FunctionType funcType, String negotiateData, SipResponse prevResponse)
   at Microsoft.Rtc.Internal.Sip.NegotiateLogic.StartCompression()
   at Microsoft.Rtc.Internal.Sip.NegotiateLogic.AdvanceOutboundNegotiation()
   at Microsoft.Rtc.Internal.Sip.TlsTransport.DelegateNegotiation(TransportsDataBuffer receivedData)
   at Microsoft.Rtc.Internal.Sip.TlsTransport.OnReceived(Object data)

At first glance, this looks like a network issue, so we made sure that the dev machine could reach the Lync server on all the ports it needed (it could). Then we rechecked the certificate, and verified that the MTLS connection was forming, but then immediately terminating. We even tried changing the client machine name to something without hyphens, and re-provisioning the application a couple of times just to make sure that there wasn’t something wrong along the way. Finally, running OCSLogger.exe on the developer machine and digging through the traces, we saw this:

(000000000270F9F2)local cert SN robinlaptop.corp.computer-talk.com is not same as localfqdn localhost127.0.0.1. Send feature info

Then we checked the hosts file, and noticed a line that looked like this:

127.0.0.1       localhost127.0.0.1       localhost127.0.0.1       localhost127.0.0.1       localhost127.0.0.1       localhost127.0.0.1       localhost

Now, I have no idea how this developer got this in his hosts file, but removing the entry fixed everything. After discovering this, I tried adding one line for localhost in the hosts file, and everything was fine. I took the same entry and split it to four lines-everything was fine. I tried changing ONE CHARACTER in the bad entry-everything was fine. For some reason, only that exact sequence of characters managed to short-circuit the platform startup.

In any case, I’m putting this out there as a troubleshooting suggestion for anyone else who runs into the same thing. Make sure that nothing has messed with your hosts file-it’s not necessarily the first thing you’d think to check, but I know I’m adding it to my list of troubleshooting steps from now on.

Also, if anyone knows exactly why this particular entry in the hosts file does what I described here, I’d really like to know why.

Share

blog comments powered by Disqus

About the Author

Chris Bardon

Chris is the Chief Software Architect at ComputerTalk Technology, and has been writing code in one form or another since the mid 80s.  He’s been working with Lync since it was called Live Communciations Server (back in 2004), and he’s been working with .NET since 2002.  He’s a big fan of UCMA, WCF, and XAML, and knows enough about most of the Microsoft development technologies to be dangerous. He also collects pinball machines.