What is Digitization? A glossary

Digitization - The process of creating a digital image; the process of creating a digital image and then presenting it on a computer, local area network or the Internet.

Digitize - To translate into a digital form. For example, optical scanners digitize images by translating them into bit  maps. It is also possible to digitize sound, video, and any type of movement. In all these cases, digitization is performed by sampling at discrete intervals. To digitize sound, for example, a device measures a sound wave's amplitude many times per second. These numeric values can then be recorded digitally. - from Webopedia http://www.webopedia.com/TERM/d/digitize.html

Digitization - The process of translating data into digital form (binary coded  files for use in computers).Scanning images, sampling sound, converting text on paper into text in computer files, all are examples of digitization. - from High-Tech Dictionary http://www.computeruser.com/resources/dictionary/definition.html?lookup=6430

Digitization - Professor Mike Gerhard's Definition of Digitization:  http://www.tcom.bsu.edu/tcom101/trends.htm  - "Digitization or computerization, refers to the shift to a society where computers are ubiquitous; to carry out, control, or conduct by means of a computer.  Digital refers to communication signals or information presented in a discrete form--usually in a binary or two-state way--0 or 1." - from Digitization trends http://www.bsu.edu/web/jladams2/trends.html


The following are from the Colorado Digitization Project http://coloradodigital.coalliance.org/glossary.html

Archival Image - An image meant to have lasting utility. Archival images are usually kept off-line on a cheaper storage medium such as CD-ROM or magnetic tape, in a secure environment. Archival images are of a higher resolution and quality than the digital image delivered to the user on-screen. The file format most often associated with archival images is TIFF, or Tagged Image File Format, as compared to on-screen viewing file formats, which are usually JPEGs and GIFs.

Digital Image - An electronic photograph scanned from an original document... a representation of whatever is being scanned, whether it be manuscripts, text, photographs, maps, drawings, blueprints, halftones, musical scores, 3-D objects, etc.

GIF - Graphic Image File Format. A widely supported image storage format promoted by Compuserve for use on the web.

HTML - Hypertext Markup Language. An encoding format for linking and identifying electronicdocuments and used to deliver information on the World Wide Web.

JPEG - Joint Photographic Experts Group. A compression algorithm for condensing the size of image files. JPEGs are helpful in allowing access to full screen image files on-line because they require less storage and are therefore quicker to download into a web page.

Pixel - Often referred to as dot, as in "dots per inch". "Pixel" is short for picture elements, which make up an image, similar to grains in a photograph or dots in a half-tone. Each pixel can represent a number of different shades or colors, depending on how much storage space is allocated for it. Pixels per inch (ppi) is sometimes the preferred term, as it more accurately describes the digital image.

Resolution - The number of pixels (in both height and width) making up an image. The more pixels in an image, the higher the resolution, and the higher the resolution of an image, the greater its clarity and definition (and the larger the file size).

Scanner - A device for capturing a digital image. There are many types of scanners, such as flatbed scanners, drum scanners, slide scanners, and microfilm scanners.

TIFF - Tagged Image/Interchange File Format. A file storage format implemented on a wide variety of computer systems, usually used for archival scans.

URL - Uniform Resource Locator. A standard addressing scheme used to locate or reference files on the Internet. Used in World Wide Web documents to locate files. A URL gives the type of resource being used and the path to the file. The syntax used is: scheme://host.domain/path filename.

World Wide Web (WWW) - An interconnected network of electronic hypermedia documents available on the Internet. WWW documents are marked up in HTML. Cross references or hyperlinks between documents are recorded in the form of URLs.


The following are from the TechWeb TechEncyclopedia http://www.techweb.com/encyclopedia/

Database - A set of related files that is created and managed by a database management system (DBMS). Today, DBMSs can manage any form of data including text, images, sound and video. Database and file structures are always determined by the software. As far as the hardware is concerned, it's all bits and bytes.

OCR (Optical Character Recognition) - The machine recognition of printed characters. OCR systems can recognize many different OCR fonts, as well as typewriter and computer-printed characters. Advanced OCR systems can recognize hand printing. When a text document is scanned into the computer, it is turned into a bitmap, which is a picture of the text. OCR software analyzes the light and dark areas of the bitmap in order to indentify each alphabetic letter and numeric digit. When it recognizes a character, it converts it into ASCII text. Hand printing is much more difficult to analyze than machine-printed characters. Old, worn and smudged documents are also difficult. Scanning documents and processing them with OCR is sometimes as much an art as it is a science.


Created for CLRCNet Digitization I  workshop at the Mid-York Library System. Funding for the training is provided in part by a Federal LSTA grant.