Latency and Throughput
- if you've ever experience lag in a video game,
- it was most likely due to a combination of high latency and low throughput
- Latency and Throughput are the 2 most important measures of the performance of a system
Terms used
Latency (lower is better)
- it is basically how long it takes for data to traverse a system
- and more specifically, how long does it take for data to get from 1 point in a system, to another point in the system
- in other words, it is the time it takes for a certain operation to complete in a system
- when talking about latency, we might refer to a lot of different kinds of things in a system
- might be talking about the latency of a network request
- how long does it take for 1 request to go from a client to a server, and then back from server to client
- we refer to the time that it takes for a request to go from a client to a server, and then back from the server to the client as latency
- however, if on a machine like for instance a server, and it is reading a piece of data from memory, or reading that piece of data from disk
- the time that it's gonna take to read that data is also referred as latency
- basically different things in systems have different latencies
- therefore, there is a trade off between different ways that a system is built
- as certain things are gonna have higher latencies, and other things are gonna have lower latencies
- most often this measure is a time duration, like milliseconds or seconds
- when designing a system, you would want to optimize the system by lowering the overall latencies of the systems
- some systems needs low latencies
- such as video games, when you experience lag, it is due to the server being played on is located halfway across the world from you
- and it takes a while for your computer (the client) to make a network requests to the video games server
- some systems does not need to care about latencies
- such as some websites, as its not important if it takes a couple seconds for a page to load
- what they care about more is maybe the accuracy or up time
- they want their website to always show accurate information or never to be down
- but are ok to lose some latency
latencies orders of magnitude examples for different types of data transfers or operations in a system
- reading 1 mb from RAM: 250 µs (0.25 ms)
- reading 1 mb from SSD: 1,000 µs (1 ms)
- transfer 1 mb over 1 Gbps network: 10,000 µs (150 ms)
- does not take into account of distance, this assumes computers are next each other
- reading 1 mb from HDD: 20,000 µs (150 ms)
- sending a packet (1,000 or 1,500 bytes) over a network to a different country on a round trip: 150,000 µs (150 ms)
- why does it take that long?
- electricity has to travel, and it takes some time when it has to travel halfway across the world
Throughput (higher is better)
- how much work can a machine perform in a given period of time
- throughput in this context normally refer to the amount of work that a computer or machine can perform in a given amount of time
- this usually refers to how much data can be transferred from 1 point in a system to another point in a system in a given amount of time
- typically measure this throughput in gigabits per second, or Kilobits pers second, megabits per second
- in summary, it is the number of operations that a system can handle properly per time unit
- e.g.: if there are multiple clients trying to make requests to a single server
- the throughput will be how many requests can the server handle in a given amount of time
- or how many bits can the server handle or let through per second
how to increase throughput or how to optimize a system for throughput?
- the naive answer
- just pay for it as normally this is controlled by the cloud provider
- however, increasing throughput does not neccessary fix a potential problem that you might have in a system
- e.g.: when a server is handling multiple requests from multiple clients
- that expects to server thousands of requests or even millions of requests per second (such as google search)
- just trying to blindly increase throughput on this network won't make sense
- because you will still eventually have some sort of bottleneck
- better solution
- is to have multiple servers to handle all of the requests
- therefore, instead of multiple request going through the same pipeline same bottleneck, they might go to different servers instead
- e.g.: you might have a system, or parts of a system, smaller parts of a system,
- that have very low latency, which supports really fast data transfers
- might also have another part of a system that has very low throughput
- that ends up with the low latency data transfers or operations that the system had to be canceled out
- in summary, you cannot make assumptions about latency or throughput based on the other