Why TAGE is the best

The TAGE branch predictor by André Seznec and Pierre Michaud is the best branch predictor today, winning the last two branch predictor competitions (CBP2 and CBP3). It was introduced in a 2006 paper:

A case for (partially) tagged Geometric History Length Branch Prediction
André Seznec, Pierre Michaud
Journal of Instruction Level Parallelism (JILP), 2006.
While this and other papers on TAGE describe the branch predictor in great detail, they do not (in my opinion) explain the insights behind TAGE, that is, why TAGE works so well. In this post, I give my view of what makes TAGE so great.

I’ll first define what I call a “branch scenario,” a concept important to understanding any branch predictor.

The key premise of branch prediction is that branch behavior repeats, which means branch behavior can be learned and predicted. Specifically, we usually assume that the outcome of a branch is a function of two inputs:

the address of the branch, which distinguishes it from the other branches, and
branch history, a sequence of prior branch outcomes.

I call this tuple {branch address, branch history} a “branch scenario.” The branch predictor “learns” branch behavior by recording the branch outcomes observed in each encountered branch scenario. The branch predictor predicts the branch outcome for a particular branch scenario by looking up what happened the last few times the same branch scenario occurred.

A key question in branch prediction is how long should the branch history be? On the one hand, longer histories enable accurate predictions for some harder-to-predict branches. On the other hand, with a longer history, the predictor must track more branch scenarios and thus spend more time warming up, reducing accuracy for easier-to-predict branches.

This fundamental branch prediction tradeoff was the inspiration behind hybrid branch predictors which use multiple branch histories (see McFarling’s predictor for an early example). Roughly speaking, for each branch, hybrid predictors track prediction accuracy for that branch given different history lengths. The history length that results in highest accuracy is the one used to generate the predictions for that branch.

TAGE is one such hybrid branch predictor; however, three major improvements set it apart:

Entry tagging

Most prior branch predictors do not tag predictor entries. They simply use branch history and address (that is, the branch scenario) to index into some predictor entry and assume that the entry represents the same branch scenario. With longer branch histories, multiple branch scenarios are likely to alias to the same entry. In contrast, TAGE partially tags its entries, so that TAGE can determine with high certainty which branch scenario an entry corresponds to.

This ability is important because in hybrid branch predictors the currently predicted branch scenario indexes into multiple predictor entries (one for each tracked branch history length). Untagged predictors do not know which of these predictor entries truly correspond to previous instances of the current branch scenario and which are aliased to other branch scenarios. Therefore, untagged predictors may predict the branch based on a previously observed outcome of a very different branch scenario! TAGE, on the other hand, does not make this mistake because TAGE knows which entries correspond to the current branch scenario and simply chooses the longest history entry from those.
Entry selection

TAGE selects which predictor entry (of those matching the current branch scenario) to use better than prior hybrid branch predictors do. Prior branch predictors make this choice simply based on the branch address. Some branches are detected to require long history, whereas others are predicted accurately with short history. In contrast, TAGE makes this choice on an even finer granularity. In fact, TAGE may use entries with different history lengths for different branch scenarios of the same branch. This happens due to the way TAGE keeps track of entry “usefulness” (see paper for details).

The finer granularity of entry selection allows TAGE to better trade off history length versus warmup time. Specifically, unlike prior predictors, TAGE does not suffer high warmup time on those branches that have only a few branch scenarios that require long history to be predicted accurately.
Longer maximum history

Entry tagging and better entry selection enable TAGE to use longer branch histories (into the hundreds of branch outcomes). Prior hybrid branch predictors cannot not use longer histories due to the risk of aliasing and high warmup time. In TAGE, entry tagging takes care of aliasing whereas better entry selection reduces the risk of unnecessarily high warmup time. Therefore, TAGE is able to get away with tracking very long branch histories, increasing accuracy for those branch scenarios in which long histories are important.

In my opinion, these three improvements are the reason why TAGE outperforms all other branch predictors.

3 Responses to Why TAGE is the best

a computer architect says:

06/04/2014 at 7:38 pm

Great summary. Thanks for the post!

Branch Prediction guy says:

10/13/2014 at 6:45 pm

Thanks for the summary. The paper by Andre Seznec gives a good detail on the TAGE branch predictor.

Architecture guy says:

05/10/2021 at 5:45 pm

Nice helpful summary/insight. Thanks !