To most people, a website is all they think of when they go online to surf the net. As long as they land on the site they wanted, get the information they were looking for, or perform the task they set out to do, they barely give a second’s thought to how things are done in the background.
But the reality is that the page they are on and interacting with is for the most part just a veil that covers a complicated process that lies behind it. The majority of the technology is located at the sites’ places of residence: over at the web hosting providers’ locations.
A typical web hosting provider (that is good at what it does, as most of the established and popular ones are), will have an architecture onsite that is divided into two parts – the part of the architecture that is accessible by the public and the part that is private and hidden behind a firewall.
The architecture that is in front of the firewall and accessible to the public includes:
Doman Name System (DNS) Server – this is a server that resolves or translates the text address we normally type in a browser (a URL like “abcdefgh.com”) into a corresponding IP address of the site or service and directs the browser towards it.
Content Delivery Network (CDN) – these are servers that are placed in different locations so that visitors are served pages and data from the one nearest to them. They also store cached content and data that is frequently requested to help decrease the time it takes for it to come from the main servers.
Firewall – the firewall defines the border between the public and private parts of a hosting provider’s network architecture. It is used to permit only traffic that meets set policies and rules through, while blocking those which don’t. It can be installed as a stand-alone hardware, be integrated into other networking components (like switches and routers) or be a software part of an operating system.
Behind the firewall lies the private hardware part of the architecture. This part is further divided into two parts or layers, the Web Layer and the Service Layer.
The Web Layer includes:
Load Balancer – this is a device that distributes network or application traffic to all available resources on it and including the network, applications and devices (storage and processing), to optimize throughput, response times, capacities and the overall reliability of the system as a whole.
Web Application Server – this server functions both as a web server and as an application server. The web server returns web content including pages and images that are commonly requested. The application server part handles more complicated requests that involve programming logics usually found on dynamic websites and could involve extracting data from sources like files, databases, services or devices.
A web application server can either be an all-in-one server or be a tri-tiered amalgamation of separate a web server and an application server which are connected by a load balancer. The application server alone would be the one handling input and outputs from the data sources.
User Directory – this is a server that holds all the visitors’ logins, IDs and credentials that are required to authenticate who is allowed into the network. It also determines who gets access to what part of the network and what data they are allowed to see or use.
File Repository – this is a device or application that stores files which serve data and information. Visitors allowed to access the repository can (depending on the access granted) save, download, delete or browse files in it.
Cache – this basically holds temporary information that is frequently requested by the web application server in the hopes of accelerating response times to clients’ requests. The information includes session data as well as the usual content.
The Service Layer includes:
Databases – these are the backend place where all the data is stored. They are connected to the network and serve all requests that come their way. In order to ensure availability they could be replicated to two or more servers. This not only ensures that there is no downtime in case a server crashes or a network fails, but also increases speeds as traffic is distributed via load balancers which connect and forward queries to the least busy of these servers.
Now that we have seen all the key components of a web hosting provider, let us have a look at a typical flow scenario to see how it all comes together.
A visitor types in a URL and it is translated into an IP address by the DNS server which leads to the CDN server. If the information requested is on the CDN server, it is presented to the visitor. If not, the request goes on to the firewall which evaluates it and either blocks it as an illegal request (like hacking attempts or illegal queries) or lets it in as legitimate traffic.
The load balancer takes over the request and assigns it to a web application server with the least traffic load or the one best-fit to address the specific request. The web application server processes the request and serves back the information after checking the user directory, cache, repositories and databases.