The Google PageRank Algorithm
PageRank was developed at Stanford University by Google's founders, Sergey Brin and Larry Page. It's based on the idea that a web page's quality can be determined by the other web pages that link to it.
Google used PageRank to determine the ranking of pages in its search results. As Google became the dominant search engine, it sparked the massive demand for backlinks.
If you've worked in SEO, you've almost certainly heard of PageRank. You might also be confused about exactly what PageRank means and how it is calculated. To answer these questions, we've defined PageRank below, along with how it's calculated and how it has changed over time.
Table of Contents
- What is PageRank?
- How Does Google Calculate PageRank?
- A Simple Example: Calculating PageRank for Three Pages
- The History of PageRank
What is PageRank?
PageRank is the first algorithm that was used by Google to rank web pages in its search engine result pages (SERPs). According to Google, the algorithm was named after Google co-founder Larry Page.
In the original paper on PageRank, the concept was defined as "a method for computing a ranking for every web page based on the graph of the web. PageRank is an attempt to see how good an approximation to importance can be obtained just from the link structure."
PageRank was further defined by Sergey Brin and Larry Page in the paper that introduced the Google search engine. The paper described PageRank as "an objective measure of citation importance that corresponds well with people's subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches." In other words, "the analysis of link structure via PageRank allows Google to evaluate the quality of web pages."
How Does Google Calculate PageRank?
The seminal paper on Google summarizes the PageRank calculation:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.
This formula calculates the PageRank for a page by summing a percentage of the PageRank value of all pages that link to it. Therefore, backlinks from pages with greater PageRank have more value. In addition, pages with more outbound links pass a smaller fraction of their PageRank to each linked web page.
According to this formula, three primary factors that impact a page's PageRank are:
- The number of pages that backlink to it
- The PageRank of the pages that backlink to it
- The number of outbound links on each of the pages that backlink to it
A Simple Example: Calculating PageRank for Three Pages
The formula above might look intimidating, but it's relatively straightforward. To demonstrate, let's calculate the PageRank for an Internet with three web pages.
In the example above, web page A has a backlink that points to web page B and web page C. Web page B has a backlink that points to web page C, and web page C has no outbound links. Based upon this, we already know that A will have the lowest PageRank and C will have the greatest PageRank.
It's important to remember that the PageRank formula is iterative. This is because the PageRank of each page depends on the PageRank of the pages pointing to it. Each time the calculation is run, you get closer to the final answer.
Here's the PageRank formulas and results for the first iteration assuming d=0.85:
- Page A: (1 - 0.85) = 0.15
- Page B: (1 - 0.85) + (0.85) * (0.15 / 2) = 0.213745
- Page C: (1 - 0.85) + (0.85) * (0.15 / 2) + (0.85) * (0.21375 / 1) = 0.3954375
This is just the first iteration of the calculation. To get the final PageRank of each page, the calculation must be repeated until the average PageRank for all pages is 1.0.
The History of PageRank
Google was not the first company to use link analysis to determine the ranking of websites in search results. Robin Li, who later founded Baidu, developed the Rankdex algorithm in 1996. Li's U.S. patent was filed one year before Google's analogous patent.
Larry Page and Sergey Brin began developing PageRank in 1996 at Stanford University. Other developers involved in the project included Scott Hassan, Rajeev Motwani, Alan Steremberg, and Terry Winograd. The patent for PageRank was filed on January 10, 1997. Stanford shared exclusive license rights on this patent with Google for 1.8 million shares, which it sold in 2005. As of September 24, 2019, PageRank and all associated patents are expired.
In its early days, Google publicly displayed PageRank scores in its products. In 2000, Google released the Google Toolbar. This plugin came with several features, including the ability to search the web, bookmark pages, and access Google accounts.
The first Toolbar also allowed anyone to see the PageRank score for any web page they viewed. As shown below, the plugin returned a score on a logarithmic scale from 0 to 10 for each page. PageRank data was available in the Toolbar until 2016.
In 2000, Google also began publicly sharing PageRank data in the Google Directory. The Google Directory was a list of top websites organized by category, and sorted by PageRank. Google ultimately shut down this product in 2011.
Despite removing public access to PageRank scores, Google continued to use PageRank for search rankings. Over time, the company updated the PageRank algorithm. Google also made search algorithm updates related to PageRank – for example to counteract the practice of PageRank sculpting in 2008.
In 2017, Gary Illyes confirmed that was still using PageRank as a signal. However, this information has been disputed. Former Google engineer Jonathan Tang clarified that Google replaced the version of PageRank developed at Stanford "in 2006 with an algorithm that gives approximately-similar results but is significantly faster to compute."
Since Google has stopped publicly sharing information about PageRank, it's almost impossible to know exactly how the algorithm is being used today and how it's been modified over time. However, the core insight of PageRank – that the Internet's link graph can be used to determine the quality of individual web pages – remains highly influential. Backlinks and internal links are still critical to SEO performance.