Skip to content

Commit 4ed03e7

Browse files
authored
Add definitions for SLI, SLO, SLA, error budget and toil (bregman-arie#9077)
* add definitions for SLI, SLO, SLA, error budget and toil * add credit * Add credits section * add google sre book under questions
1 parent c746d0f commit 4ed03e7

File tree

1 file changed

+66
-4
lines changed

1 file changed

+66
-4
lines changed

topics/sre/README.md

Lines changed: 66 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,71 @@
33
## SRE Questions
44

55
<details>
6-
<summary>What is SLO (service-level objective)?</summary><br><b>
7-
</b></details>
6+
<summary>What is an SLI (Service-Level Indicator)?</summary>
7+
<b>
8+
An SLI is a measurement used to assess the actual performance or reliability of a service. It serves as the basis for defining SLOs.
9+
10+
Examples:
11+
- Request latency
12+
- Processing throughput
13+
- Request failures per unit of time
14+
15+
Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
16+
</b>
17+
</details></br>
18+
19+
<details>
20+
<summary>What is an SLO (Service-Level Objective)?</summary>
21+
<b>
22+
23+
An SLO is a target value or range of values for a service level that is measured by an SLI
24+
25+
Example: 99% across 30 days for a specific collection of SLIs.
26+
27+
It's also worthy to note that the SLO also serves as a lower bound, indicating that there is no requirement to be more reliable than necessary because doing so can delay the rollout of new features.
28+
29+
Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
30+
</b>
31+
</details><br>
32+
33+
<details>
34+
<summary>What is an SLA (Service-Level Agreement)?</summary>
35+
<b>
36+
37+
AN SLA is a formal agreement between a service provider and customers, specifying the expected service quality and consequences for not meeting it.
38+
39+
SRE doesn't typically get involved in constructing SLAs, because SLAs are closely tied to business and product decisions
40+
41+
Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
42+
</b>
43+
</details><br>
44+
45+
<details>
46+
<summary>What is an Error Budget?</summary>
47+
<b>
48+
49+
An Error Budget represents the acceptable amount of downtime or errors a service can experience while still meeting its SLO.
50+
51+
An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget.
52+
53+
If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period.
54+
55+
The error budget is a mechanism for balancing innovation and stability. If the SRE cannot enforce the error budget, the whole system breaks down.
56+
57+
Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
58+
</b>
59+
</details></br>
860

961
<details>
10-
<summary>What is SLA (service-level agreement)?</summary><br><b>
11-
</b></details>
62+
<summary>What is Toil?</summary>
63+
<b>
64+
65+
Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.
66+
67+
If you can be automate a task, you should probably automate the task.
68+
69+
Automation significantly reduces Toil. Investing in automation results in valuable work with lasting impact, offering scalability potential with minimal adjustments as your system expands.
70+
71+
Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/)
72+
</b>
73+
</details>

0 commit comments

Comments
 (0)