A former employee allegedly leaked a Yandex source code repository, part of which contained more than 1,900 factors used by the search engines for ranking websites in search results.
This leak has revealed 1,922 ranking factors Yandex used in its search algorithm, at least as of July 2022. Perhaps Martin MacDonald put it best on Twitter today: “The Yandex hack is probably the most interesting thing to have happened in SEO in years.”
Yandex is not Google. If you plan to read the full list of Yandex ranking factors, remember that Yandex is not Google. If you see a ranking factor listed by Yandex, that doesn’t mean Google gives that signal the same amount of weight. In fact, Google may not use all of the 1,922 factors listed. In fact, many of the factors in this leak are deprecated or unused.
That said, a lot of these ranking factors may be quite similar to signals Google uses for search. So reviewing this document may provide some useful insights to better help you understand how search engines, such as Google, work from a technological standpoint.
The bigger picture. The code appeared as a Torrent on a popular hacking forum, as reported by Bleeping Computer:
…the leaker posted a magnet link that they claim are ‘Yandex git sources’ consisting of 44.7 GB of files stolen from the company in July 2022. These code repositories allegedly contain all of the company’s source code besides anti-spam rules.
Yandex calls it a leak. Because the code appeared on a popular hacking forum, it was first thought that Yandex was hacked. Yandex has denied this, and provided the following statement:
“Yandex was not hacked. Our security service found code fragments from an internal repository in the public domain, but the content differs from the current version of the repository used in Yandex services.
A repository is a tool for storing and working with code. Code is used in this way internally by most companies.
Repositories are needed to work with code and are not intended for the storage of personal user data. We are conducting an internal investigation into the reasons for the release of source code fragments to the public, but we do not see any threat to user data or platform performance.”
MacDonald shared the full list of 1,922 factors here on Web Marketing School. I highly recommend downloading it, as I fully expect Yandex will try to scrub this information from the internet. (Editor’s note: In an earlier version of this article, we had linked to a translated version on Dropbox, but that link quickly went away.)
Early analysis of ranking factors. Alex Buraks created two Twitter threads – first thread, second thread – analyzing the various ranking factors. There’s another interesting Twitter thread here from Michael King.
Dan Taylor also shares some findings in Yandex Data Leak: What We’ve Learned About The Search Algorithms on Russian Search News.
Many of Yandex’s ranking factors are what you’d expect to see:
- PageRank and many link-related factors (e.g., age, relevancy, etc.).
- Text relevancy.
- Content age and freshness.
- End-user behavior signals.
- Host reliability.
- Some sites get preference (e.g., Wikipedia).
Some of the ranking factors SEOs are finding surprising: the number of unique visitors, percent of organic traffic, and average domain ranking across queries.
And as Taylor pointed out, 244 of the ranking factors were categorized as unused and 988 as deprecated, “meaning that 64% of the document is either not actively used or has been superseded – so it’s more like ~690 potential ranking factors, and a lot of them contain thin descriptions.”
As noted before, Yandex is not Google and does not have the same traffic numbers that Google has, so this leak is not the newest #1 resource to get a top page spot on Google. But it is always nice to get an inside picture of how some search engines function.
With Google’s recent push to highlight EAT content, some of the technical ranking factors on Yandex might not be as heavily weighted on Google.