Search Engines have become the bread and butter of the Internet. Web content would be very difficult to find without the help of trusty search engines. Understanding how it works could open up a world of how to properly execute search engine optimization and search engine marketing. Every search engine has three main functions:
- Crawling – Imagine tiny digital crawlers scurrying their way across your page to find and discover what sort of content you have on your site.
- Indexing – All content that crawlers were able to discover will be tracked and stored to remember.
- Retrieval – All remembered and stored content will be fetched when users query them in the search box.
These three facets of a search engine each play a certain role in ensuring that when you search up cat videos, you aren’t getting content on snakes. Let’s dive into each of them below.
Crawling
Simply put, it is the acquisition of data about a website. This involves scanning sites and collecting details such as titles, images, keywords, linked pages, and other information from your web pages. Different types of crawlers may also look for different details such as page layouts, advertisement placement, and links.
These crawlers are automated bots that are also called spiders. However, this is one type of spider you definitely don’t want to be exterminated. These spiders visit page after page as quickly and efficiently as possible, using page links to find where to go next.
In the earlier days, Google’s spiders were able to read several hundred pages per second. Now, Google’s spiders can read several thousand pages per second. When a spider visits a page, it latches every link on the page and collects them. Some sites are crawled more frequently, and some are crawled with greater depths, but sometimes a crawler may give up if a site’s page hierarchy is far too complex.
Indexing
Indexing is when the spiders bring back data to be processed and stored away for memory. It is a giant library with a plethora of information from dozens of different sites. This is all the intel that is squared away until someone on the Internet searches for a certain term.
Retrieval and Ranking
Retrieval is when the search engine processes your search query and returns it to the most relevant pages that matches what you’ve looked up. Most search engines try and differentiate themselves by using a different criteria to pick and choose which pages best match what you need to find.
Ranking algorithms check your search query against billions of other pages to determine the relevance of it. Companies guard their ranking algorithms due to its complexity. A better algorithm translates into a better search experience.
Search Engine Exploitation
In the past, search engines ranked sites by how often search keywords appeared on a page, which led to “keyword stuffing”, which is filling pages with keyword-heavy nonsense. Then came the concept of link importance, where search engines valued sites with a lot of incoming links because they interpreted site popularly as relevance. However, this led to link spamming all over the web.