VAIF: Variance-driven Automated Instrumentation Framework

for diagnosing performance problems in distributed applications

Developers use logs to diagnose performance problems in distributed applications. However, it is difficult to know a priori where logs are needed and what information in them is needed to help diagnose problems that may occur in the future. We present the Variance-driven Automated Instrumentation Framework (VAIF), which runs alongside distributed applications. In response to newly-observed performance problems, VAIF automatically searches the space of possible instrumentation choices to enable the logs needed to help diagnose them. To work, VAIF combines distributed tracing (an enhanced form of logging) with insights about how response-time variance can be decomposed on the critical-path portions of requests’ traces. We evaluate VAIF by using it to localize performance problems in OpenStack and HDFS. We show that VAIF can localize problems related to slow code paths, resource contention, and problematic third-party code while enabling only 3-34% of the total tracing instrumentation.

Featured video

Publications

Automating instrumentation choices for performance problems in distributed applications with VAIF. Mert Toslali, Emre Ates, Alex Ellis, Zhaoqi Zhang, Darby Huye, Lan Liu, Samantha Puterman, Ayse K. Coskun, and Raja R. Sambasivan. SoCC'21. Seattle, WA, November 2021. [paper, slides]

An automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications. Emre Ates, Lily Sturmann, Mert Toslali, Orran Krieger, Richard Megginson, Ayse K. Coskun, Raja R. Sambasivan. SoCC'19. Santa Clara, CA, November 2019. [paper, slides]

Independent talks

Pythia: An Automated, cross-layer instrumentation framework for diagnosing performance problems in distributed applications. Emre Ates, Presenter. San Diego, CA, November 2019.

Logging what matters: The Pythia just-in-time instrumentation framework. Lily Sturmann, Presenter. Observability Practitioners Summit. Seattle, WA, December 2018.

Logging what matters: Presenting Pythia and just-in-time instrumentation. Presenters, Lily Sturmann and Emre Ates. Boston, MA, November 2018.

Logging what matters: Just-in-time instrumentation and tracing. Presenters, Lily Sturmann and Emre Ates. Boston, MA, August 2018.