AI Content Chat (Beta) logo

GitHub

Case Study | GitHub increases productivity and system visibility with Cloud Observability

GitHub increases productivity and system visibility with Cloud Observability Headquarters: San Francisco, CA Founded by Tom Preston-Werner, Chris Industry Segment: Computer Software Wanstrath, P.J. Hyett, and Scott Chacon, Employees: 2,000+ GitHub is a cloud-based platform and service for software developers that provides storage, distributed version control, Challenges bug tracking, access control, feature requests, continuous integration, task Experiencing latency with unknown causes management, and wikis for projects. Incomplete system visibility Mixed naming conventions leading to team confusion Since it was launched in 2008, the site has amassed more than 100 million users and has become the de facto resource for open- Business Results source collaboration and community. In addition to a free, open-source tier, the Reduced Mean Time To Resolution (MTTR) company also offers paid plans with private Complete visibility into the GitHub Load Balancer (GLB) repositories. Adopted OpenTelemetry and semantic conventions

www.servicenow.com The Challenge The GitHub team had difficulty resolving an issue in their system: sporadic latency of 20-40 milliseconds with no visibility into where or why. Their engineers investigated it through their normal, manual methods, but they couldn’t identify the problem. Ariel Valentin, Staff Software Engineer, lamented, “Developers have limited visibility into what the cause of the latency stemmed from. We tried to connect things in a log stream from one system to the load balancer in another. None of those keys matched up because folks were using different logging formats. In all those cases, there was no normalization. It was really hard.” The Solution Using Cloud Observability, their team found that the latency wasn't a part of their microservice but, rather, part of the GitHub Load Balancer (GLB). The GLB manages every interaction in the system and directs all traffic for both internal and external users, which meant this issue was directly affecting millions of developers. The team was also able to pinpoint the exact, end-to-end request that caused the problem (first leaving the monolith then trying to make a call to the auth system) and quantify the impact.

www.servicenow.com "I now get insights that were previously rooted in my limited understanding of a complex system. What I love about Cloud Observability is that it tells you what is actually happening in your system as opposed to what you think is happening. I cannot imagine doing my job without it. " Ariel Valentin Observability Engineer Outcomes In addition to improving mean time to resolution (MTTR) and visibility into the GitHub Load Balancer, the team also saw an increase in developer productivity from adopting OpenTelemetry semantic conventions. The SDK and best practices as a standard have saved countless hours of instrumentation when a back-end change is needed. “OpenTelemetry empowers us to build integrated and opinionated solutions for our engineers,” said Wolfgang Hennerbichler, Senior Engineering Manager. “Designing for observability can be at the forefront of our application engineer’s minds because we can make it so rewarding.” © 2023 ServiceNow, Inc. All rights reserved. ServiceNow, the ServiceNow logo, Now, Now Platform, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries. Other company names, product names, and logos may be trademarks of the respective companies with which they are associated.