Zynga (NASDAQ: ZNGA) is a leading developer of the world's most popular social games that are played by millions of people around the world each day.
RJ Lim, Senior Software Engineer at Zynga, found his team had to endlessly sift through log files in hopes of finding information about production errors.
Even a stable application can write millions of lines to the log files. If an exception is thrown and they want to find it, sifting through the logs in the hope of finding the root cause is very time consuming.
Furthermore some problems are difficult to debug even when you do have logs. Sometime they might not be debuggable at all since there is only so much information you can fit inside a log line, and it gives a narrow view of what is going on inside the application.
What they needed was something that could help debug Java application problems in production. Then they discovered OverOps. Now with OverOps they have all the information they need, including the complete source code and variable state across the entire call stack.
One particular incident where OverOps proved its value was a configuration update to an ExecutorService that was not picked up. "Since there’s no logging for uncaught exceptions, there was no way to know that the ExecutorService was failing, and no way we would have been able to detect that through logs. However, OverOps catches all uncaught exceptions, so it showed us where the issue happened and gave us the complete variables across the call stack. Instead of guesswork, we diagnosed the problem quickly and fixed it right away."
"Thanks to OverOps, our errors and exceptions gained meaning and we were able to find and fix them rapidly"
Since using OverOps, Zynga has reduced the time it takes to identify and resolve production errors from 3 days to 15 minutes.
Zynga have now integrated OverOps into their daily workflow and can get real-time notifications via HipChat and fix issues as soon as they occur.