I’m writing this as a preventative measure to hopefully ensure that other technicians don’t have to go through what I have gone through over the last several weeks. Long story short:
If you are running AST and everything on the server looks fine, your Gateway Manager is Authenticated, yet your clients wind up with “There has been a network error”, “Please use the Gateway Manager to check on the status of the Diagnostic Gateway”, and lastly, “Gateway Manager Needs Your Attention”, ask yourself this question: “Am I running this across VLANs?”
Apparently, if the answer is “Yes, I’m running this across VLANs – don’t be silly! Why would any large organization with a reputable data center hosting its servers in a mixed environment operate any differently?”, Apple’s answer is “Tough.”
That’s right – after pouring through logs, burning AST to the ground and rebuilding it back up a dozen times, and in general pulling my hair out, I finally decided to get Apple on chat and do a tandem dive. Turns out, the server was fine, it’s just that AST is not supported across VLANs. Even though Netbooting is.
You know, I get why Apple changed the game up a bit. I do. This is some solid protection for the company against people putting in claims on Applecare-covered hardware, but sending in parts from another non-covered system. Good on them. However, when this not only becomes the primary troubleshooting option for technicians, but the sole, mandatory option, it had damned well better work.
Everything about AST is sketchy. One bit of troubleshooting I got impossibly hung up on is that even though I had installed version 1.3.2 and everything indicated 1.3.2, one line in the logs kept saying 1.2.3. A logical tech like myself would think that some manner of preference, config file, or keychain was still residing on the server from a previous install, and that could easily be the culprit. Especially when there is no mention of VLANs in any AST documentation and the Netboot service is fine… it must be the server config at that point, right? RIGHT?
Turns out, that line “current version is 1.2.3” is actually pulled from Apple’s servers referencing a txt file that Apple never bothered to update. Check it out here for yourself if you don’t believe me (SUPPLEMENTAL EDIT – as of 02/27/12, this text file was updated, but only to say “Version 1.3” – nothing else) . Anyone running AST 1.3.1 or 1.3.2 is getting this little nugget in their logs. Fun stuff. Especially during troubleshooting.
It also didn’t help that many of the errors that I referenced here (with purposeful detail) don’t bring up any useful hits on Google. I found ONE Apple Discussion where a bewildered individual was asking the same questions I was, but nobody ever answered.
So, it is my hope that this blog post hits the interwebs, and can be seen by other people trying to troubleshoot their servers.
Personally, I was a lot happier when Apple just had hardware test DMGs to download. I’m especially happier with Dell’s DOSD system: “Here’s the system serial number, I found this error code by hitting F12 on the computer, now gimme my part”.