[ Back ] Page Last Updated: 13 January 1998

A Glossary of Internet and Web Terminology
by Ian Graham

Copyright © 1998 Ian S. Graham

A program that can travel over the Internet and access remote resources, on behalf of a user. A proper agent should be able to run on remote machines and travel freely from machine to machine. The Java language shows promise for permitting safe agents, since the Java interpreter does not let Java applets harm the computers they contact.

The location of a hypertext link in a document. An anchor can be either the start of a hypertext link or the destination of a hypertext link.

anonymous FTP
Computers can run an anonymous FTP server, which lets anyone log in to the computer under the username anonymous, and access public resources. In general, when you log in as user anonymous, you (or your browser) use your e-mail address for the password string.

A program or mini-application that can be downloaded over a network and activated on the user's computer. To do this safely, you must have a secure way of running applets. The Java language is designed to support safe applets.

A system that automatically generates and maintains a database for anonymous FTP servers' contents. An archie server accesses information from FTP servers and archives the directory listings. An archie client can access these databases and search for programs or files matching a particular name.

archive file
A single file that contains a collection of different files and/or directories. Archive files are often used to transport collections of files across the Internet, since you can transport a large collection in a single archive file. UNIX archives have the extension .tar (for Tape ARchive). PKZIP is often used to create archives on DOS computers (suffix .zip), while Stuffit is often used to create Macintosh archives (suffixes .sea or .sit). PKZIP and Stuffit archives are also compressed.

American Standard Code for Information Interchange. This is a 7-bit character code capable of representing 128 characters. Several of these characters are special control characters used in communications control, and are not printable.

A quantity that defines a special property of an HTML element. Attributes are specified within an element start tag. For example, <IMG SRC="image.gif"> means that the element IMG has an attribute SRC, which is assigned the indicated value.

Any program used to view material prepared for the World Wide Web. Mosaic, Netscape, and lynx are some examples. Browsers are able to interpret URLs and HTML markup and also understand Internet protocols such as HTTP, FTP, and Gopher.

Centre Européen pour la Récherche Nucleaire; a large particle-accelerator laboratory located near Geneva, on the French-Swiss border. The World Wide Web originated here, largely due to the efforts of Tim Berners-Lee.

Common Gateway Interface, the specification for how an HTTP server should communicate with server gateway programs. Most servers support the CGI specification.

character reference
A way, within an SGML language such as HTML, of referencing a character using a simple string of numbers reflecting the position of the character in the current character set. For example, the character reference é is the reference for e with an acute accent (é) within the ISO Latin-1 character set. Note, however, that character reference will produce a different character if a different character set is involved. For a character-set independent representation, see entity reference.
Chinese/Japanese/Korean, often used in the discussion of character sets and of the issues important to these language/character set groups.

Any program used to extract information from a server. For example, a browser such as Mosaic is a client that can access data from HTTP (and other) servers. All browsers are Web clients.

Many files on the Internet are compressed--this reduces the space taken up by a file and makes transmission over the Internet faster. The client must then have software able to decompress the file.

A small quantity of data exchanged between (and then stored on) a client and a server, and usually hidden from the user. An example is the Netscape cookie mechanism, discussed in Chapters 7 and 8.

The combination of carriage-return (CR) and linefeed (LF) characters. This combination is used by several Internet protocols, including HTTP, to denote the end of a line.

Computing Services Office, a system that lets users search for student and/or faculty names at a school or university. It is one approach at creating "white pages" for Internet e-mail addresses.

Campus Wide Information Systems, electronic systems for distributing campus information, which first became common with university-based Gopher servers.

dial-up connection
The action of using a telephone and modem to connect to a remote computer. Dial-up connections are slow compared with direct connections, or ISDN.

domain name
A symbolic name for a computer, that can be translated by a nameserver into a computers formal numeric Internet address (IP address). Domain names let users reference Internet sites without having to know the numerical address.

Transfer of a file from a remote computer to a local computer.

Document Type Definition. An SGML document type definition is a specific description of a markup language. This description is written as a plain text file, often with the filename extension .dtd. The HyperText Markup Language (HTML) has its own Document Type Definition file, often called html.dtd.

Electronic mail.

element (HTML)
The basic unit of an HTML document. HTML documents use start and stop tags to define structural elements in the document. These elements are arranged hierarchically, to define the overall document structure. The name of the element is given by the tag, and indicates the meaning associated with the block. Some elements are empty, since they don't affect a block of text. Elements that have content are also often called containers.

end tag
A markup tag that denotes the end of an element.

entity reference
A way, within an SGML language such as HTML, of referencing a character using a simple string of ASCII characters. For example, the entity reference é is the reference for e with an acute accent (é). See also character reference.

Frequently Asked Questions, on the Internet, a FAQ is a document that answers the most frequently asked questions on a particular topic. Most newsgroups have FAQs that are frequently posted to the newsgroup.

A firewall is used to separate a local network from the outside world. In general a local network is connected to the outside world by a "gateway" computer. This gateway machine can be converted into a firewall by installing special software that does not let unauthorized TCP/IP packets pass from inside to outside and vice versa. You can give users on the local network, and "inside" the firewall, access to the outside world using the SOCKS package or by installing the a proxy server on the firewall machine.

fragment identifier
A text string included using a NAME attribute in an A element, that labels the anchored location in a document--thus the word fragment, since it references a document fragment.

File Transfer Protocol, an Internet client-server protocol for transferring files between computers.

Graphics Interchange Format, a format for storing image files. It is the most common format for inline images in HTML documents. The other common format is JPEG.

A protocol for information delivery used in distributed information systems. Gopher clients give you access to this information. Gopher is a menu-based delivery system and does not have hypertext capabilities. Gopher has been largely supplanted by HTTP.

The leading part of a data message. HTTP messages are sent with an HTTP header preceding the actual communicated data.

A program launched or used by a browser (such as Netscape Navigator) to process files that the browser cannot handle internally. Most users have helper applications to play sound or movie files, to uncompress compressed files or to unstuff archives.

In database searches, the number of documents that resulted from the search; for servers, the number of document requests received by a server.

home page
The introductory page for a World Wide Web site. A home page usually provides an introduction to the site, along with hypertext links to local resources.

HyperText Markup Language, a markup language defined by an SGML Document Type Definition (DTD). To a document writer, HTML is simply a collection of tags used to mark blocks of text and assign them special meanings.

See hypertext link.

Any document that contains hypertext links to other documents. HTML documents are almost always hypertext documents.

hypertext link
A hypertext relationship between two anchors, leading from the head anchor to the tail anchor. On the Web, this is usually a link from one hypertext document to another. Lining points are associated with anchors.

inline image
An image that is merged with the displayed text. Placing images in this manner is often described as "inlining" the images.

See internationalization.

Internet Assigned Naming Authority, the agency which registers name for common use on the Internet. General information is found at: ftp://ftp.isi.edu/in-notes/iana/

Internet Engineering Task Force, a collection of task forces at work on developing standards for Internet protocols and architectures. There are IETF groups working on such issues as URLs, HTTP, and HTML.

You mean you don't know? The Internet is the world wide network of computers communicating via the TCP/IP protocols.

Software development aimed at providing software that can serve a multilingual, internationalized audience. Often abbreviated as I18N (InternationalIz8tion).

Internet provider
A company from whom users purchase Internet connectivity. This could either be a dedicated connection (for example, a telephone connection that stays open twenty-four hours a day) or a dial-up connection. Usually users run software such as PPP or SLIP to allow Internet connectivity across the line.

Internet resources
The collection of data, documents, and databases available on the Internet.

An Intranet is a collection of services that use an Internet as the underlying communications technology, designed to support business operations and applications. Basically just another buzzword, like enterprise computing, and mission-critical applications.

IP Address
The numerical Internet protocol address of a computer on the Internet. Every computer on the Internet has a unique numerical address.

International Standards Organization, an international organization responsible for setting international standards, such as the ISO Latin-1 character set.

ISO 10646
A multi-byte character set proposed by the ISO as a universal character set for the characters and symbols used by all the world's languages. The most important 2-byte subset of this language, know as the basic multilingual plane, is equivalent to the Unicode character set.

ISO Latin-1
An 8-bit character code developed by the International Standards Organization. An 8-bit code contains 256 different characters. In the ISO Latin-1 code, the first 128 characters are the equivalent to the 128 characters of the US-ASCII character set (also called the ISO 646 character set). The remaining 128 characters consist of control characters and a large collection of accented and other characters commonly used in European languages.

A programming language, developed by Sun Microsystems, designed specifically for use in applet and agent applications. Java programs can only run under a Java interpreter, which is designed to eliminate the risk of a rogue Java applet damaging the local computer.

A scripting language developed by Netscape Inc. Javascript program listings can be included within an HTML document, and are then executed by the Web browser when the document is loaded. A similar scripting language, known as VBScript, has been developed by Microsoft.

Joint Photographic Experts Group, an image format. In general JPEG allows for higher quality images than GIF. Browsers cannot display JPEG images inline, and instead must display them using helper programs.

A network authentication system, based on the key distribution model. It allows machines communicating over networks to prove their identity to each other through a trusted third party. It also prevents eavesdropping or replay attacks (recording and retrying encryption information "snooped" off the network), through support for a variety of data encryption schemes.

Local Area Network.

See hypertext link.

A freeware clone of UNIX for 386-based PC computers. Linux consists of the linux kernel (core operating system), originally written by Linus Torvalds, along with utility programs developed by the Free Software Foundation and by others. Since PC hardware is inexpensive and linux is essentially free the combination of the two is a practical way of developing inexpensive and reliable HTTP service.

An automated electronic mailing list, managed by a listserv program. Listservs are commonly used by discussion groups.

A popular character-mode (text-only) World Wide Web browser.

Multipurpose Internet Mail Extensions, a scheme that lets electronic mail messages contain mixed media (sound, video, image, and text). The World Wide Web uses MIME content-types to specify the type of data contained in a file or being sent from an HTTP server to a client.

A graphical browser for the World Wide Web, developed at NCSA. There are several commercial browsers based on Mosaic.

Motion Picture Experts Group, a common video file compression method.

A mixture of media--text, audio, and video, under the control of a computer. The World Wide Web is a form of multimedia.

name token
In SGML, this is a character string composed of the ASCII letters a-z or A-Z, the numerals, 0-9, a dash (-) or a period (.), and that must begin with a letter. Name tokens are usually case-insensitive. Many attribute values are defined as name tokens.

A computer (and a program on the computer) that translates domain names into the proper numeric IP address (or vice versa).

National Center for Supercomputing Applications. The NCSA is situated at the Urbana-Champaign campus of the University of Illinois. The NCSA software development team developed the Mosaic and NCSA HTTPD server programs.

Network News Transfer Protocol, used for communicating USENET articles across the Internet.

A small package of data. TCP/IP breaks messages up into packets, and sends each packet independently to the message destination. The protocol ensures that there is no error in transmission and that the entire message arrives.

Partial URL
A location scheme containing only partial information about the resource location. To access the resource, the client must construct a full URL, based on the partial URL. It does so by assuming that all the information not found in the partial URL is the same as that used when the client accessed the document containing the partial URL reference. A partial URL is often called a relative URL, since the location of the linked resource is determined relative to the location of the document containing the partial URL.

Privacy Enhanced Mail, a special mail protocol that provides encryption of mail message content.

Practical Extraction and Reporting Language, a scripting language created by Larry Wall. Because powerful data and text manipulation programs can be written quickly and easily using perl, it has become a popular language for writing CGI applications.

Pretty Good Privacy, a publicly available encryption scheme that uses the "public key" approach--messages are encrypted using a "public" key, but can only be decrypted by a "private" key, retained by the intended recipient of the message.

A program module that adds inline functionality to a Web browser (or, in general, any other program). On the Web, plugins let Web browsers display data such as VRML scenes, real-time video, or multimedia data inline with the HTML document. Plugins, when available, are accessed through HTML EMBED or OBJECT elements.

port number
Any Internet application communicates at a particular port number specific to the application. For example FTP, HTTP, Gopher and telnet are all assigned unique port numbers so that the computer knows what to do when contacted at a particular port. There are accepted standard numbers for these ports so that computers know which port to connect to for a particular service. For example, Gopher servers generally "talk" at port 70, while HTTP servers generally "talk" at port 80. These default values can be overridden in a URL.

Point to Point Protocol, a communications protocol that turns a dial-up telephone connection into a point-to-point Internet connection. This is commonly used to run WWW browsers over a phone line.

See Internet provider.

In computer networks, a protocol is simply an agreed convention for inter-computer communication. Thus the TCP/IP protocol defines how messages are passed on the Internet, while the FTP protocol, which is built using the TCP/IP protocol, defines how FTP messages should be sent and received.

proxy server
A server that acts as an intermediary between a user's computer and the computer they want to access. If a user makes a request for a resource from computer "A," this request is directed to the proxy server, which makes the request, gets the response from computer "A," and then forwards the response to the client. Proxy servers are useful for accessing World Wide Web resources from inside a firewall.

Request For Comments, is a document, written by groups or individuals involved in Internet development, that describes agreed-upon standards or proposes new standards for Internet protocols. For example, the rules for electronic mail message composition are specified in the document RFC 822.

On the World Wide Web, a program that autonomously searches through trees of hypertext documents, retrieving files for indexing (or other purposes). Also called a worm.

A computer that determines, on a local basis, which route packets will take en route to their destination.

A common, commercial public-key encryption technology, owned by RSA Data Security Inc. RSA Inc. also holds several patents on public-key encryption in general, so that popular publicly available encryption tools, such as PGP and PEM, infringe on RSA patents. PGP and PEM therefore cannot be used in commercial products without licensing approval of RSA Inc.

A program, running on a networked computer, that responds to requests from client programs running on other networked computers. The server and client communicate using a client-server protocol.

Standard Generalized Markup Language, is a standard for describing markup languages. HTML is defined as an instance of SGML.

The UNIX shell is the program that interprets the commands typed at the terminal. A shell can also be used to run simple script programs called shell scripts. There are several different shells, with slightly different commands and syntax. The most common are the Bourne shell (sh), the C shell (csh), and the Korn dhell (ksh). The DOS command-line interpreter can be thought of as a shell.

Serial Line Internet Protocol, a communications protocol that that can turn a dial-up telephone connection into an Internet connection. SLIP can be used to run Web browsers over a phone line, but is less stable than a PPP connection.

Simple Mail Transfer Protocol, the standard by which electronic mail messages are communicated over the Internet.

A software package that allows hosts inside a firewall to communicate with the outside world. To allow access to the outside world a secure network can run a SOCKS server on its gateway/firewall machine; all networking software inside the network must be configured to talk to the SOCKS server. SOCKS is a proxy server without the special caching capabilities of a caching HTTP proxy server.

Secure Sockets Layer, is a technology developed by Netscape Communications Inc. for encrypting data sent between clients and servers. SSL is the basis for Netscape's secure communication technologies.

start tag (HTML)
A markup tag that denotes the start of an element.

tag (HTML)
HTML marks documents using tags. A tag is simply typed text surrounded by the less than and greater than signs, for example: . An end tag has a slash in front of the tag name; for example .

Tape archiver, a program (and file format) commonly used on UNIX systems for archiving and transporting large collections of files and/or directories.

Transmission Control Protocol/Internet Protocol, the basic communication protocol that is the foundation of the Internet. All the other protocols, such as HTTP, FTP, and Gopher, are built on top of TCP/IP.

A terminal emulation protocol that allows you to make a terminal connection to other computers on the Internet. This requires that you run a telnet client on your computer and connect to a telnet server on the other machine.

The Internet Adapter, a program run under a dial-in UNIX account that supports SLIP-like connection between a dial-up computer and the dial-in site. TIA is useful if you have a UNIX account with a company that does not provide PPP or SLIP service.

Tag Image File Format, a graphic file format developed by Aldus Corporation. TIFF is the standard format of many graphics and desktop publishing programs.

A variant of telnet that emulates the behavior of IBM model 3270 display terminals.

A 2-byte character set, developed as a universal character set for international use. The current 2 version of Unicode is equivalent to the basic multilingual plane subset of the ISO 10646 character set. Internationalized HTML uses Unicode as its base character set.

An operating system, commonly used on the backbone machines on the Internet. Most Web servers are run under the UNIX operating system.

Uniform Resource Characteristics, is an as-yet unspecified format for representing aggregate information about a resource or collection of resources.

Uniform Resource Identifier, the generic term for a coded string that identifies a (typically Internet) resource. There are currently two practical examples of URI's, namely Uniform Resource Locators (URLs) and partial URLs.

Uniform Resource Locator, the scheme used to address Internet resources on the World Wide Web. A URL specifies the protocol, domain name/IP address, port number, path, and resource details needed to access a resource from a particular machine. Partial URLs are an associated scheme that specify a location relative to the location of a document or resource containing the URL reference.

Uniform Resource Names, are as yet defined, but are the holy grail of addressing, as any file would retain the same URN, regardless of which computer the file resided on. URNs would be universal identifiers for Internet resources, regardless of the resource origins.

The Internet's world wide bulletin board system, consisting of over 6,000 topical discussion groups, called newsgroups. The newsgroups related to the World Wide Web were mentioned at the end of Chapter 1. USENET postings are distributed around the world using the NNTP protocol.

A document scripting language developed by Microsoft, also known as Visual Basic Script. See Javascript for a description of document scripting languages.

A program launched by a browser to view files that the browser cannot handle internally and that are accessed by standard hypertext anchors. Thus you have viewers for JPEG images, sound files, and MPEG movies. Viewers are also often called helpers or helper applications. Viewers are distinct from plugins, since they work separately from the browser.

When you access a World Wide Web document you are said to be visiting the site.

World Wide Web Consortium, an academic and industrial consortium devoted to the development of Web standards and technologies.

Any combination of space or tab characters that separate two characters or two character strings.

Wide Area Information Servers, a system and protocol for Internet accessible databases. The WAIS protocol is based on the Z39.50 protocol.

Wide Area Network.

A computer program that can make copies of itself. Alternatively, on the Web, a worm is synonymous with a robot.

The World Wide Web. Also called the Web or W3.

A protocol for communicating search information and search results, allowing remote searching of databases. Many library systems support the Z39.50 protocol.

The HTML 4.0 Sourcebook © 1995-1998 by Ian S. Graham