Wednesday, September 16, 2020

Towards a better understanding of 'bottlenecks' in PC building

In PC Building enthusiast communities, the subject of PC builds being 'bottlenecked' comes up quite frequently. It comes up a lot around the time new hardware -- like the recent RTX 3000 series -- is introduced. Or when (often novice) users are looking for build advice (e.g. 'Will a Ryzen 5 3600 bottleneck my build?').

I recently made a long post about how the concept of 'bottleneck' is frequently misused, poorly understood by and generally unhelpful for PC builders on r/buildapc. The original post generated over 1,000 karma and lots of polarizing discussion and criticism. It's still available here and is worth a read if you're interested in the topic, though for reasons that are not clear to me it was removed by the moderators.

Since the original post, I've considered some of the criticisms and have been chewing on the issues more. This post represents me working through some of this in written form.

How is the term 'bottleneck' generally understood?

When people use this term, what do they mean by it, both in general and specifically in the context of PC building? Among inexperienced PC builders, there is a lot confusion and fuzziness, but that is to be expected whenever novices engage with a concept in a new domain. I'll come back to this issue later, but to start I want to focus on the more sophisticated understanding of the term that more experienced folks, often with engineering backgrounds, have.

The sense most experienced folks understand the term, which is nicely encapsulated by this Quora answer, I'm going to call the Informal Engineering Version of the concept. I define it as follows:
bottleneck (Informal Engineering Version): noun. A component of a system, the performance limitations of which limit the overall performance of the system.

This definition struck me as fine in an informal sense, but left me feeling vaguely uneasy. It took me a lot of thinking to get at precisely why, but I think it amounts to two defects of the definition: a major and a minor one.

What's the alternative?

To get at the major one, it helps to ask what the alternative would be to a system where the performance is limited by the performance of a single component. I think there are two.

The first would be a system where performance was unlimited. But this is, of course, impossible. Every system, just like every thing, has some specific nature, including specific limitations. In the realm of PC building, there is obviously no such thing as a PC of unlimited performance.

The second alternative would be that the system performance is limited by the performance of more than one component. There's a sense in which this could be true: in PC building it would be the theoretical 'perfectly balanced system.' And targeting a balanced build is good advice insofar as it goes. For instance, for a given build budget, and all other things being equal, it makes sense to spend it in a 'balanced' way, rather than under-investing in certain components and over-investing in others.

The 'every system has a bottleneck' school

But in practice, it's not possible to achieve the Platonic ideal of a balanced build. In PC builds, and indeed in most systems, there will almost always be some single factor that imposes an upper limit on system performance. The proponents of the Informal Engineering concept of 'bottleneck' in the PC building community often espouse this view, with their mantra being 'Your system will always have a bottleneck.' For instance, they'll say, if your weak GPU is currently gating your build, as soon as you upgrade it, your previously second-weakest component (let's say your monitor with its low refresh rate) will become the new bottleneck.

It's worth examining what this actually amounts to. Because all systems, practically speaking, have some single weakest component, all systems are perpetually bottlenecked. But all this means is that all systems have some limitation on their performance, which is to say that all systems have some definite identity. It reduces the concept of 'bottleneck' to meaning nothing more than a statement of the law of identity; that a system can do what it does and can't do what it can't. Am I 'bottlenecked' in my inability to fly because I don't have wings? Or in my inability to still have my cake after I eat it because the universe doesn't allow for that?

This is why I say this conception of bottleneck is useless. Every system is equally 'bottlenecked.' Your brand new Core i9 10000k series, 256 GB RAM, RTX 3090 and 360hz display system is bottlenecked because it can only output as many FPS as its weakest component (whichever that is) allows it to.

The same system with, e.g. a GTX 700 series card instead would be less performant -- the neck of the bottle at the GPU would be narrower, so to speak. Proponents of the Informal Engineering definition would say that the latter system is more bottlenecked than the former. But I think this view is off. It's like saying that a corpse is 'more dead' than a living person. No it isn't. The living person isn't dead at all.

This wrong conception is common among PC building novices and is reinforced by veteran builders of the 'every system has a bottleneck' variety. Many of the new builder questions on, e.g., PC building Reddit communities ask things like 'Will this graphics card be a bottleneck in my build?' The invariable response from this is crowd is, of course, that every system has a bottleneck. Maybe it's the graphics card right now. But if the graphics card were upgraded, the system would be bottlenecked by some other component. What is the novice builder supposed to do with this perspective? Throw up his hands and resign himself to a system that will forever be hopelessly bottlenecked in one way or another, his performance aspirations always frustrated?

No, not every system is bottlenecked

The way out of this dilemma is to identify that something is a bottleneck only if it, in fact, imposes a significant limitation on the overall performance of the system. This leads to what I'll call the Interim Engineering Version of the concept, which is close to the first definition on the 'bottleneck' Wikipedia page:

bottleneck (Interim Engineering Version): noun. a component of a system, the performance limitations of which impose a significant limit on the overall performance of the system.

On this improved conception, whether something is a bottleneck or not hinges on whether the performance limitation is significant. And what counts as significant is highly dependent on the context of use. If a given component's limitations don't impose a significant limitation on overall system performance in the context in question, than that component is not a bottleneck, even if it happens to be the single component that is limiting overall system performance. Moreover, if the system's overall performance is adequate to its purpose, then the system as a whole is not bottlenecked.

In PC gaming terms, within the context of playing CS Go at 1080p (i.e. 1920 by 1080, 60hz), the following systems are equally not bottlnecked:

  1. Core i9 10900k, RTX 3090, 360hz monitor
  2. Core i9 10900k, GTX 1060, 360hz monitor
  3. Core i9 10900k, GTX 3090, 60hz monitor
  4. Core i5 6500, GTX 1070, 60hz monitor
All of these systems will deliver an acceptable play experience of at least 60fps at the target resolution and high graphics settings. Systems 1 and 4 represent 'balanced' builds. (1) is vastly more performant than (4), but both are not bottlenecked with respect to this task, and neither contains a single component that is markedly weaker than the others. Systems 2 and 3 each have an obvious component that is limiting the overall system performance (the GPU and monitor, respectively), but both will still be adequate to the task. Their limitations are not significant in this context.

In PC building, there is no value, in and of itself, in achieving the Platonic ideal of a system where every component fully saturates the next component downstream at every step of the chain. It doesn't necessarily follow that the 'unbottlenecked' system will outperform a bottlenecked one. System 2 from the list above will offer a better experience than the perfectly balanced, Platonic ideal of a system from say, 10 years ago, in spite of the GPU being a 'bottleneck' because all of the components are better than the best components you could purchase 10 years ago, including the 'bottlenecking' GPU.

Component x does not bottleneck component y

Another subtlety here is that while an individual component may 'be a bottleneck' in a given system, 'being bottlenecked' (or not) is a property of the entire system, not of a component. In other words, it is fine, in principle, to say 'My CPU is a bottleneck' or 'My CPU is bottlenecking my system.' However, the common PC building forum question (and responses to it) of, e.g., 'Will this CPU bottleneck this GPU?' is invalid.

As noted above, it will always be the case, in practice, that some component of a system is not fully saturated by another component. The fact alone that your CPU is not capable of outputting as many FPS as your GPU is capable of processing doesn't tell us anything of practical utility in evaluating your build or whether or not it's 'bottlenecked.' That cannot be assessed without reference to the overall performance of the system against its intended purpose. Even if the CPU is capable of saturating only 25% of the GPU's maximum capacity, if your goal is to play Control at 8K / 60 FPS, then as long as the CPU can consistently deliver 60 FPS to the GPU, the system is not bottlenecked.

More deeply, even when one component really is bottlenecking the overall system, that's the perspective to take on it. By analogy, if your shoes are too small, it's correct to say that they, e.g., limit your ability to walk. It would be weird, on the other hand, to say they limit your feet's ability to walk. Walking is an activity of a person, not of feet, even though it involves feet. Likewise, performance (or lack thereof) against a purpose is an attribute of a system, not of any one of the system's components.

This usage is, by the way, perfectly consistent with how the term is used in practice in engineering contexts. No one regards a system as bottlenecked if its overall performance is adequate to the needs (or anticipated needs) it is meant to serve. When a system is inadequate, it is often good methodology to search for bottlenecks and to fix any ones that are identified. And it would obviously be poor methodology to, e.g., increase the diameter of the base of the bottle while ignoring the diameter of the neck. But once performance is rendered adequate (or adequate to address expected future needs), the hunt usually stops. Engineers generally don't waste time quixotically tilting at 'bottleneck' windmills if the overall performance of the system is acceptable to current and anticipated future needs.

As a side note, this is where the term 'bottleneck' as the source of the analogy is unfortunate, because in actual bottles, the narrowness of the neck is a feature not a bug. It improves the performance of the overall system relative to its purposes. A bottle without a neck is a jar. Bottles offers numerous advantages over jars for the applications we use bottles for. It's cheaper to seal, for example, because the sealing component (e.g. a cork or metal cap) can be smaller, and, historically, cork and metal were expensive materials. Most crucially: the fact that the narrow neck reduces the flow rate makes it easier to pour out of the bottle in a standardized and controlled way.

Other kinds of performance limitations

The minor flaw of the standard definition of bottleneck is the tendency to make it overly broad. Even the Interim Version of the definition suffers from this problem. It is important to recognize that bottlenecks are not the only type of performance-limiting condition of a system, or even of a PC build.

Because PCs -- more than many other kinds of systems -- are inherently modular, with different modules contributing to performance in different ways, there is a tendency to regard any sub-optimally performing component as a bottleneck. But consider some examples of PC build issues:
  1. A CPU that is incapable of delivering enough FPS to the GPU for a given game, leading to perceptible hitching and slow down;
  2. A GPU that is incapable of driving enough frames to saturate a monitor's refresh rate for a given game;
  3. A power supply that is not capable of supplying enough wattage for a given build;
  4. A GPU that does not support realtime ray tracing, meaning that feature is not available for a given game that supports it;
  5. A power supply that is capable of supplying enough wattage for a given build but is failing, delivering inconsistent power output;
  6. A front panel power button with a faulty contact, meaning the PC will not boot when the button is pressed.
Each one of these examples involves some specific component of a PC build not performing as expected (or at all), where that lack of performance impacts the performance of the entire system. I take (6) to be an unambiguous example of something that is not a bottleneck, and I don't expect many people would regard it as one. It's an issue that impacts performance (indeed, this system won't perform at all) and it's isolated to one component, but it isn't a bottleneck. If you think about the corrective pathway, it doesn't involve increasing the capacity of the limiting component: it just involves fixing or replacing it. The issue also doesn't manifest to the user as any sort of delay or slow down in terms of anything 'moving through' the system. To call the faulty power button a bottleneck would be, I think, to torture the term 'bottleneck.'

I consider (5) an exactly parallel example to (6). In this case, the power supply has the capacity to power the system, it's just faulty. This would likely manifest to the user as system instability (e.g. random reboots). Likewise, the corrective pathway doesn't involve increasing the capacity of the power supply (e.g. moving from a 500 to a 600 watt PSU), it just involves replacing the faulty PSU with a working one. Interestingly, however, in the comment thread on the original post, a redditor asserted that this example was not only a bottleneck but an 'obvious' one. Even more interestingly, another user commented on the same thread that no one could possibly consider this to be an example of a bottleneck and that I was criticizing a straw man. That's doubly amusing because the 'straw man' was put forth as an actual argument in the thread he was responding to. Again, I think this is a torturous use of the term bottleneck. A failing or defective component is an example of a system performance issue distinct from a bottleneck.

Likewise, I don't think it's plausible to argue that (4) is a bottleneck. An inability to do realtime ray tracing may indeed result in a sub-optimal play experience, but it seems misguided to say the GPU's lack of ray tracing support 'bottlenecks' the system's performance. Lack of feature support is a distinct type of system limitation, not a type of bottleneck.

(3) is the first example where it becomes plausible to call something a bottleneck, and indeed the first place where I think most people would start applying the term (e.g. 'The PSU's inadequate wattage is bottlenecking the system.') I certainly don't think this is a ridiculous position to take, but I'm going to argue that it isn't a bottleneck. Again, there's no question that the PSU's inadequate wattage is limiting system performance. There's also no question that the performance limitation is one related to capacity: if the PSU could deliver more watts, the performance limitation would be removed. However, as in (5), the limitation would manifest as system instability.

On the literal bottleneck analogy, I think this is more like, say, the glass of the bottle being slightly porous and causing leaks than it is to the neck of the bottle being too narrow to provide adequate flow. Though the porousness of the bottle and the maximum wattage of the PSU are both capacities of their respective components that limit performance of the overall systems, they are capacity limitations of a different kind than those that can lead to bottlenecks. Stated another way: not (even) every limitation in the capacity of  a component that significantly impacts the performance of the system overall is a bottleneck.

(2) Is another example of something that would traditionally be referred to as a bottleneck, and I would go as far as to say it is one that most PC builders would argue is an unambiguous one. I don't think it's quite so unambiguous. The first thing that gives me pause is that we should observe that this condition (GPU delivering less FPS than the monitor's refresh rate) is incredibly common, even among very high-end gaming systems. In fact, it is a potentially desirable state of a high-end gaming system. A builder with a large budget, for example, might purchase the highest refresh rate monitor available (e.g. 360hz) knowing full well that his (also very high-end) GPU is not capable of fully saturating it all the time on all the titles he plays. And it would be perfectly rational for him to do so. Given that the 360hz monitor is (at the time of this writing) the highest-refresh-rate device he can purchase, it makes sense to have the headroom at his disposal. But to say the GPU is bottlenecking if it isn't constantly driving 360 FPS on every single title would be to drop a ton of context about how games work: notably that FPS are variable and that performance will differ from game-to-game and moment-to-moment.

As a side note, another important element here is the market context. At the time of this writing, the most powerful consumer GPU yet announced is an RTX 3090. Though independent benchmarks have not been released, it is clear that even that card is not capable of fully saturating a 360hz display at every reasonable consumer resolution and combination of game settings. So if someone is going to assert that a 3090 is a 'bottleneck' in a given situation, the obvious response is: in comparison to what? That is: in comparison to what possible alternative that would alleviate the 'bottleneck?' As of now, the universe (more specifically the portions of it controlled by Nvidia and AMD) does not provide one. As noted earlier, this is like considering the nature of reality a 'bottleneck' to having your cake and eating it, too.

More deeply, the situation has to be evaluated with reference to the whether the performance impact on the system overall is significant with reference to its intended purpose. The fact is that most people, can't perceive the difference between 120hz and 240hz, let alone 240hz and 360hz. This includes even most gamers, who we would expect to better appreciate the difference than the general population. Perhaps some elite esports athlete would benefit from consistently driving 360hz as opposed to achieving a variable framerate between, say, 250 and 310hz, but for the average gamer, the performance difference is not significant. (I realize there are other reasons why it is desirable for a GPU to drive a higher framerate than a display can refresh at, but I'm ignoring them for the purposes of this example).

Example (1) is, in my opinion, a clear an uncontroversial example of a bottleneck, properly understood. Here, a component (the CPU) is limited in a way that significantly impacts the performance of the entire system. This impact is significant because it is clearly perceptible by the player in the form of an undesirable consequence: noticeable lag and stuttering.

Like example (3), the limitation of the CPU is one of capacity. But it is a specific type of capacity limitation: one in which the capacity limitation has to do with (by analogy) flow through the system. The rate at which the CPU can deliver frames to the GPU causes the GPU to have to wait long enough that the delay results in a play experience that is not smooth. In other words, bottlenecks involve a limit in the throughput of a component limiting the performance of the entire system. Finally, this yields the proper definition of bottleneck, which I'll call the the Rigorous Engineering Version of the concept. It is the one articulated in the second paragraph of the Wikipedia entry:

bottleneck (Rigorous Engineering Version): noun. A component of a system, the throughput limitations of which impose a significant limit on the overall performance of the system.

'Bottleneck,' properly understood in this way and restricted to this usage is a valid concept and is applicable to certain types of PC build situations, as in (1).

So much for the theoretical discussion. In a future post, I'll take on the practical implications for PC building and PC building advice and, in particular, the questions PC builders should be asking (and answering) instead of the various flavors of 'Will component x bottleneck my system?'

No comments: