in the event a customer complains that they tried to purchase a product, receive a success product purchased message, credit card got charged, but didn't get access to the product
this kind of issue is hard to understand and to reproduce, therefore requires logs to get more information (such as errors, operations, requests) to allow us to debug
the process of having visibility into a system's key metrics
it is typically implemented by collecting important events in a system and aggregating them in human-readable charts
provides you with insights that could be useful if you are designing, building and maintaining a system
in summary when it comes down to system design, monitoring comes down to making sure that in your overall system you've got systems in play to monitor important metrics about your overall system
build some sort of service or use a pre-built tool to scrape your logs (good logs must be implemented) and create metrics out of them
however, you are limited to your logs, thus, logs must contain all the information you require
another limitation is that if you decide to change your logs, you risk breaking the metrics or monitoring system
another popular way of gathering metrics is to use a time-series database (a database that is specialized for data related to time or data that will be measured over time)