Saturday 8 February 2014

The winlogon notification subscriber termsrv is taking long time to handle RDS & Xenapp

Our RDS environment at work slowed to a crawl over a couple of days. Specifically, the logon and logoff times became extremely long, of the order of a minute or so.

There were some clues in the event logs, namely events with id 6005 and 6006 with the error message : "the winlogon notification subscriber termsrv is taking long time to handle rds -xenapp".

I pulled my hair out trying to figure it out. This is why I find supporting Windows systems (and especially RDS) to be extremely frustrating. You find yourself eliminating things one by one....

We eliminated a number of potential root causes such as problems with the AD, with group policy and more.

This affected all our RDS servers too which I just couldn't work out.

I spoke with one of our Infrastructure experts at work and even he couldn't work it out.

In the end, the problem resolved itself. To this day, I still have no idea why the problem came to be in the first place and how it went away, all on it's own.....

3 comments:

  1. Thanks for writing about this and hijacking the search results which screws me over when I have the same problem.

    ReplyDelete
  2. Hi, we have the same problem on our Xenapp servers on random sessions, I didn't found a solution yet. Do you push windows updates on your servers regularly or not ?
    Thank you.
    Ray

    ReplyDelete
  3. I agree, Anonymous, quite frustrating to find this post... but I can say what while we were fighting this issue it was humorous to pass this link to my colleague as the 'fix' and watch his hopes rise and then crash as he read the less than helpful detail. LOL!

    Hopefully the following can be a ray of sunshine for posterity.

    We started recieving notifications that Citrix applications were not connecting (hanging in connection progress bar) and users from other groups noted that some Windows servers (2003-2016) were taking 2-3 minutes to gain access to the desktop.

    All of these systems - Citrix hosts and affected windows member servers had system EventId 6005 errors similar to above. Reboots, patching, or anything else done on the member servers had no effect. We verified AD sites and services and group policies.

    Finally we did a network capture from one affected host for the duration of a login attempt the using the Microsoft Message Analyzer we reviewed the capture and found under the system process with PID 4 there were SMB2 errors showing ioCtl timeouts to the domain controller SYSVOL$ share. We then checked other affected systems' to see which domain controller they were authenticating against. (from CMD prompt do 'echo %LOGONSERVER%') and found they were all against the same DC. As a quick test we gained permission to take the DC offline, as soon as the DC was not available the hosts connected to another DC in their site and login time was as expected. Rebooting the affected domain controller returned it to functional status. We will continue to monitor this host for recurrence of this problem, but we now know symptom, cause, and that a reboot is a quick remediation.

    Karma+1

    In your service,
    Aaron Meyer

    ReplyDelete

Popular Posts