|
3 | 3 | ## SRE Questions
|
4 | 4 |
|
5 | 5 | <details>
|
6 |
| -<summary>What is SLO (service-level objective)?</summary><br><b> |
7 |
| -</b></details> |
| 6 | +<summary>What is an SLI (Service-Level Indicator)?</summary> |
| 7 | +<b> |
| 8 | +An SLI is a measurement used to assess the actual performance or reliability of a service. It serves as the basis for defining SLOs. |
| 9 | + |
| 10 | +Examples: |
| 11 | +- Request latency |
| 12 | +- Processing throughput |
| 13 | +- Request failures per unit of time |
| 14 | + |
| 15 | +Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/) |
| 16 | +</b> |
| 17 | +</details></br> |
| 18 | + |
| 19 | +<details> |
| 20 | +<summary>What is an SLO (Service-Level Objective)?</summary> |
| 21 | +<b> |
| 22 | + |
| 23 | +An SLO is a target value or range of values for a service level that is measured by an SLI |
| 24 | + |
| 25 | +Example: 99% across 30 days for a specific collection of SLIs. |
| 26 | + |
| 27 | +It's also worthy to note that the SLO also serves as a lower bound, indicating that there is no requirement to be more reliable than necessary because doing so can delay the rollout of new features. |
| 28 | + |
| 29 | +Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/) |
| 30 | +</b> |
| 31 | +</details><br> |
| 32 | + |
| 33 | +<details> |
| 34 | +<summary>What is an SLA (Service-Level Agreement)?</summary> |
| 35 | +<b> |
| 36 | + |
| 37 | +AN SLA is a formal agreement between a service provider and customers, specifying the expected service quality and consequences for not meeting it. |
| 38 | + |
| 39 | +SRE doesn't typically get involved in constructing SLAs, because SLAs are closely tied to business and product decisions |
| 40 | + |
| 41 | +Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/) |
| 42 | +</b> |
| 43 | +</details><br> |
| 44 | + |
| 45 | +<details> |
| 46 | +<summary>What is an Error Budget?</summary> |
| 47 | +<b> |
| 48 | + |
| 49 | +An Error Budget represents the acceptable amount of downtime or errors a service can experience while still meeting its SLO. |
| 50 | + |
| 51 | +An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget. |
| 52 | + |
| 53 | +If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period. |
| 54 | + |
| 55 | +The error budget is a mechanism for balancing innovation and stability. If the SRE cannot enforce the error budget, the whole system breaks down. |
| 56 | + |
| 57 | +Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/) |
| 58 | +</b> |
| 59 | +</details></br> |
8 | 60 |
|
9 | 61 | <details>
|
10 |
| -<summary>What is SLA (service-level agreement)?</summary><br><b> |
11 |
| -</b></details> |
| 62 | +<summary>What is Toil?</summary> |
| 63 | +<b> |
| 64 | + |
| 65 | +Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. |
| 66 | + |
| 67 | +If you can be automate a task, you should probably automate the task. |
| 68 | + |
| 69 | +Automation significantly reduces Toil. Investing in automation results in valuable work with lasting impact, offering scalability potential with minimal adjustments as your system expands. |
| 70 | + |
| 71 | +Read more: [Google SRE Handbook](https://sre.google/sre-book/table-of-contents/) |
| 72 | +</b> |
| 73 | +</details> |
0 commit comments