Dynatrace has built-in support for service reliability methodology. The concept of Site Reliability Engineering (SRE) was highlighted by Google for the first time in 2003. It is an agile-like approach centered around reliability engineering, and any SRE team’s key role is to determine system performance and to work systematically in order to improve it. The basis of the work of Site Reliability engineers is to surround any system with measurable indicators that may vary, depending on the project, application or business line. However, the SRE team should not be interested in technical indicators (such as processor performance), but the so-called Service Level Indicators (SLI), i.e. indicators at the service level (business indicators). The system will serve users better not when the processor itself is less loaded, but when it is able to handle more queries without compromising quality. Together with representatives of the business side, SRE engineers set Service Level Objectives (SLO) and acceptable reliability indicators. Based on the collected signals, Dynatrace allows you for:
- quick definition of coefficients necessary to assess system reliability – SLI
- quick definition of goals for selected ratios – SLO
- analysis of the burnout rate for defined SLOs, which proactively indicates the potential to meet commitments in the settlement period