Imagine, for a moment, that we had to address one another by our Social Security numbers. It would work for may be the first ten people we got to know, but beyond that we'd likely never remember what to call anyone. Human beings, for many reasons, remember names better than numbers.
Switch to the Internet. You want to launch a telnet session, and you know the address of the telnet site at the National Center for Supercomputing Applications is ncsa.uiuc.edu. Then you decide to access the Microsoft Website, and you guess that the address is probably www.microsoft.com, so you try it. It works, and you commit both addresses to memory.
What you've done, however, is simply memorize the domain names of the computers you're connecting with. From the Internet's standpoint, you haven't actually identified anything. Computers on the Internet are identified by numbers, not by names, and the domain name is merely a human-friendly pseudonym for the computer's real ID. On the Internet, each computer is assigned an Internet Protocol (IP) address, and this numeric identifier differentiates one computer from another. You may prefer to know them by name, but the Internet prefers the vital statistics.
What's in a Name?
An IP address comprises a four-part format, with each part of the address becoming increasingly machine-specific. For example, the IP address for Microsoft's Web server is 184.108.40.206, and the IP address for the ncsa.uiuc.edu machine is 220.127.116.11. Starting from the left, the first part of the number identifies the geographic region, and the second specifies the organization or provider. After that, it gets even more specific, with the third number denoting a group of computers, and the fourth is the actual machine itself. For example, in the case of my own organization -- the University of Waterloo, Canada -- all computers share the first two portions of the IP address (129.97). The two machines in my office, 18.104.22.168 and 22.214.171.124, both belong to the 129.97.178 group; the final portions of the addresses are unique to either computer.
Why does this matter?
Whenever you specify a domain name in a telnet, finger, Gopher, FTP, or web session, the session doesn't actually begin until the domain name is translated into its IP address. This translation is the task of a Domain Name System (DNS) server or, as is more often the case, a series of DNS servers, in which the first queries the next until the correct IP number, is acquired.
Whenever you attempt to send or request a piece of data that contains a domain name, the DNS sets its magic in motion. You can avoid the DNS entirely by foregoing the domain name in favor of the IP address, but that's unrealistic most of the time, especially since addresses to which you're sending data or requests (through e-mail replies or hyperlink clicks, for example) are nearly always given to you in domain name format. As soon as you perform the action, you engage a piece of software, called a resolver, on the local machine that's been set up as a DNS server. The resolver, as its name suggests, tries to resolve the domain name, first by looking in its DNS database, then, if that doesn't work, by connecting to external DNS servers.
You can think of this process as similar to your attempts, as a new manager, to get information about a little-known company procedure. Your first step is to ask your secretary, who, if he or she doesn't know, would contact someone higher up the chain. That person would go a step higher, until eventually someone would be found who know the answer. Each person along the way would ideally store the knowledge for future reference so that the next time you needed the information, it would be available from the first person in the chain.
DNS servers around the world have to be made aware of changes as quickly as possible. Before DNS servers came along, domain name translation depended entirely on the host table, a text file stored in the /etc/hosts/ directory on your organization's Unix server, or in a relevant directory on your PC. The host table listed, line by line, Internet host names and their associated IP addresses. The master host table is compiled and stored on the machines at the Network Information Center (NIC) -- nic.ddn.mil, in netinfo/hosts.txt, and one looks at its half-a-megabyte size will tell you why you wouldn't want the responsibility of maintaining this thing. As the Internet grows, domain names are added hourly (at least), and it's impractical for every host on the Internet to keep acquiring this file for its users.
The solution was the DNS server system. Unlike the host table, DNS servers don't rely on one large mapping file. Instead, DNS servers contain only a limited amount of information, because they know where to find details on domains they have yet to encounter. Whenever a DNS server gets a request for a host not contained in its cache, it simply does the sensible thing and asks someone who knows. That "someone" is an authoritative server, a server responsible for maintaining DNS information. A server is authoritative if, when asked about an address in its domain, it can state with certainty that the name exists.
If the contacted server doesn't contain information for that domain name, it passes the request to an authoritative server higher up the chain, forming a series of queries that continues until the information is found. In practice, this means that the request can be handled by any number of servers, and that this sort of back-and-forth activity happens all day, every day on the constantly changing Internet. The server that originally made the request will cache the information to satisfy future requests without the need to go to an authoritative server. This information is set by the DNS server administrator to time-out after a specified period, to avoid the problem of fulfilling name requests with old data.
The DNS translation doesn't take long, but it does add to the time required for your request to reach the remote machine. You can perform a quick (though hardly foolproof) test of this yourself, by trying to access a website first using the domain name -- www.microsoft.com, for example -- then using the IP address -- 126.96.36.199. If you try this, however, be sure to close your browser and then reopen it to initiate a new session; otherwise, you'll simply load the cached version of the page. (And keep in mind that delays in loading can result from any of a number of factors, so take the results with a grain of salt.)
The most common software for DNS service is Berkeley Internet Name Domain, better known as BIND, originally from U.C. Berkeley but now sponsored by the Internet Software Consortium. The latest release, 4.9.3, contains the standard Unix version, plus a Windows NT port. BIND provides both resolver and nameserver software, with the resolver doing the actual queries and the nameserver providing the responses. BIND separates nameservers into three types: the primary server contains all the data about a domain; the secondary server, in effect, copies the DNS database from the primary server; and the caching-only server builds a DNS database exclusively by caching queries. Only primary and secondary servers are considered authoritative for their particular domains.
To understand how DNS servers operate, it's necessary to understand the domain name hierarchy itself. At the top of the hierarchy is the root domain. Information on this domain resides on a selected number of root servers around the Internet. Below the root domain come the top-level domains, which are either country codes or organization codes. Examples of country codes are SG (Singapore) and CA (Canada), while organization codes include the well-known COM (commercial organizations), EDU (educational institutions), GOV (governmental organizations), and NET (network organizations), among others. (Note that top-level domains outside the U.S. are normally country codes, but that U.S.-based sites usually omit country codes.) Beneath the top-level domains are the second-level domains (whitehouse.gov; microsoft.com, inforamp.net), and then the third-level domains, and so on down the chain.
If you want to establish a domain name in the U.S., you must contact the Network Information Center (NIC). Before it grants your request, it will ensure first that the name you want isn't already in use, and second, that at least two servers currently in existence will serve the new domain name. When the NIC finally fills the request, it will grant you a second-level domain, and it will place pointers to that name in the servers for the top-level domain. For example, if you request the domain name mybiz.com, you must first get two nameservers somewhere on the Internet to serve the information (your ISP's servers can handle this), and then NIC will place mybiz in the COM domain server system, with pointers to those two specified servers.
Once you have the domain in place, you can add any number of subdomains you wish. You might want to name one of your machines sales.mybiz.com and another techsupport.mybiz.com and so on. You don't need NIC approval for these, and, in fact, the NIC doesn't care. But if you want anyone to actually access your subdomains, you have to place information about them in the domain immediately above it. In this particular case, IP information about sales.mybiz.com and techsupport.mybiz.com must be placed in the servers for mybiz.com. Each server in the hierarchy contains a DNS database with entries called NS (nameserver) records, and each of these records contains the name of the domain or subdomain, plus the name of the host that acts as a server for that domain or subdomain. In our example, we'll tell the root server that it can find information about mybiz.com and all its subdomains on our DNS server, located on the machine details.mybiz.com.
Let's see how this all works. Someone at a university on the other side of the country sees a link on a Web page that points to your brand-new subdomain, techsupport.mybiz.com. She clicks on it, and her local DNS server (most likely located on a machine at the university) kicks into gear. Firstly, the server searches its own DNS database for the translation information, but because it's never encountered techsupport.mybiz.com before, the server has no record of that domain's existence and can't resolve the IP number. What is contained in its DNS database, however, is the address of a root server (all DNS servers must be set up with such a reference). The local DNS server goes out onto the Internet and queries that root server. The root server looks in its DNS database for the COM top-level domains, and it replies with the NS record that tells the university's DNS server to query details.mybiz.com for information about mybiz.com. The university's server does so, and learns from details.mybiz.com the correct IP address for techsupport.mybiz.com. At all stages in this process, the university's DNS server is caching the NS records, so that the next time anyone from the university needs an IP translation for mybiz.com, details.mybiz.com, ortechsupport.mybiz.com, the information will be available locally.
As with other Internet protocols, the DNS is outlined in several Internet Request For Comments (RFC) documents (initially in RFC 882, 883, and 973). To understand the working of a DNS server, however, RFC 1035 is your best bet. Although, you can find RFC 1035 in several places on the Internet, a nice HTML version is available at http://www.crynwr.com/rfc1035/. As you might expect, the RFC is quite technical, and you may not be interested in gaining more than a general sense of how a DNS server operates. But keep the RFC in mind, in case, somewhere along the line, you decide to become a server administrator.