The issue
So, I ran into a weird issue a while back where Response Group Calls would be answered by RGS, the IVR messages and actions would play out and the call would be dropped into a queue to look for an agent.
The Caller would hear the queue on hold music and after the queue timeout period be correctly routed to voicemail
The issue was, during the 30 odd seconds the Callers were in the queue. Calls were not presented to RGS Agents.
I checked all the obvious stuff, users signed in to RGS, Presence set correctly, not already on a call etc and tried removing and re-adding the users to the RGS group.
Still no good.
TL;DR Check for Event ID 32269 or 32270 on the Skype4B frontend servers,if you see a bunch of them, shut down the pool entirely and restart the SQL instances.Then reboot the Skype4B frontends and re-start the pool.
Turns out Grieg had this issue in his lab too, https://greiginsydney.com/sfb-2015-server-update-cu4-november-2016/
Looking into the issue
Further investigation showed that the RGS presence watcher wasn’t correctly updating presence for these users. Causing RGS routing to fail by incorrectly asserting a user was busy/not busy.My investigations found that the Skype front end servers were unable to inject entries into the QoE database and replicate changes to the backend database for users, including presence.
These issues are typically caused by extreme load on the system or a failure of the SQL backend.Reporting via the Statsman package and other built-in windows tools showed the servers were well within normal operating range and not under any undue load. Network connectivity tests between the frontends and the SQL backend also passedUpon logging into the SQL server I noted that it reported the server had been shut down un-expectedly. I checked the event logs to see the server rebooted due to a BugCheck (Blue Screen), additionally later that night another host in the cluster rebooted due to a BugCheck (BlueScreen) error as well.
Further checking into the reliability of the SQL backend showed multiple connection issues between Skype for Business and SQL clusters
Examples of connection issues
I then found reports Skype had issues running a stored procedure on the SQL backend early in the morning after the bluescreen
The server reported it was rebooting in the SQL error message, but I could find no log of this on the SQL server
I could see however that later in the day a SQL Administrator logged into the server and confirmed an un-expected shutdown on both SQL cluster nodes
Unexpected Shutdown confirmed
Soon after the unexpected shutdown was confirmed, load increased on the Skype Servers due to the start of business, at 8:54 AM Publication Sync issues started being issued on the frontends, this in turn started the issues with the queues as actual presence and published presence slowly drifted apart.
To resolve this issue, I needed to stop the services on the entire pool at the same time and restart the SQL nodes., restarting servers 1 by 1 didn’t help as the RGS matchmaking service would move (in its stuffed state) from server to server.When the pool was restarted the issue was resolved, we escalated this to the SQL team and asked them to investigate.
Install at least SP2 for SQL 2014. There were some performance issues in RTM that were resolved around CU7 or CU8, so SP1 should be your minimum anyway, that should help speed things up
Tip from here: http://www.uccramblings.com/local-sql-services-startup-issue/