www.servicenow.com The Challenge The GitHub team had difficulty resolving an issue in their system: sporadic latency of 20-40 milliseconds with no visibility into where or why. Their engineers investigated it through their normal, manual methods, but they couldn’t identify the problem. Ariel Valentin, Staff Software Engineer, lamented, “Developers have limited visibility into what the cause of the latency stemmed from. We tried to connect things in a log stream from one system to the load balancer in another. None of those keys matched up because folks were using different logging formats. In all those cases, there was no normalization. It was really hard.” The Solution Using Cloud Observability, their team found that the latency wasn't a part of their microservice but, rather, part of the GitHub Load Balancer (GLB). The GLB manages every interaction in the system and directs all traffic for both internal and external users, which meant this issue was directly affecting millions of developers. The team was also able to pinpoint the exact, end-to-end request that caused the problem (first leaving the monolith then trying to make a call to the auth system) and quantify the impact.
GitHub Page 1 Page 3