Charles Dollar conducts first of two workshops on information technology concepts and tools

Charles Dollar of Dollar Consulting conducted an all-day workshop on 22 January for over 150 archivists, records managers, information technology professionals, librarians, and others--the first in a series of six workshops being funded by a grant to Archives and History from the National Historical Publications and Records Commission. The six, part of the department's Electronic Records Training and Awareness Program, are being held over the next two years at the Archives and History Center in Columbia. The same grant is also funding three other presentations--on 10 September 2001 Rick Barry of Barry Associates presented a session at the annual conference of SC Information Technology Directors Association; John Phillips of Information Technology Decisions followed on 25 October with a half-day seminar at the annual conference of the SC Public Records Association. In the fall of 2002, Tom Ruller of the NY State Education Department will speak at the annual conference of the SC Archival Association.

Dollar's four-session workshop focused on digital representation, file formats, storage media, and portability. Following are summaries of some of the points made:

1: Digital representation of electronic records, the basics
Digital information is represented by binary language expressed in streams of 1s and 0s that computer hardware and software must interpret through various coding schemes, some of which may be device/media specific.

Examples of encoding schemes:

Vector graphics are mathematical representations of lines, colors, and shapes and are processible like ASCII.

Bit map images are numerical representations of the variation of reflectance of a targeted area of picture elements (pixels) expressed as dots per inch (dpi).

Images are compressed either by a loss-less or lossy technique; loss-less retains all data; lossy discards redundant data.

The creation or capture of metadata is essential to the trustworthiness of electronic records.

2: File formats
File formats are "containers" that specify the logical structure of data, tell the operating system how to interpret the 1s and 0s, and specify the internal arrangement of data fields and digital objects. They also provide manipulation instructions like compression algorithms and information understood by software like MS Word, HTML, XML, TIFF, SQL. Basic file formats are text, vector data, image data, audio data, moving image data, and structured data (spreadsheets and databases). When selecting a file format, several criteria should be considered. One is a format that can be presented and used with various systems; another is one that is non-proprietary; and another is one that has a large market share and is supported by multiple vendors.

 Text formats

Vector data are mathematical descriptions of geometric entities and are employed by applications like Geographic Information Systems (GIS), Computer-Aided Design (CAD), and Computer-Aided Manufacturing (CAM).

Image data include Tagged Image File Format (TIFF), Joint Photographic Experts Group (JPEG), Graphic Image Format (GIF), Portable Network Graphics (PNG), and Portable Document Format (PDF).

Audio and moving image data include the Motion Pictures Expert Group (MPEG), an international standard that enables the compression of bit streams containing moving images and audio or sound signals; it uses JPEG compression.

Structured data formats include spreadsheets and databases.

3: Storage media
Digital information is stored mainly on either magnetic or optical media, both of which are growing tremendously in capacity and decreasing correspondingly in cost. Criteria for choosing media for long term storage would include high storage capacity, high data transfer rate, twenty-year life expectancy, established and stable market presence, affordability, and suitability. For long-term storage, magnetic media are recommended over optical and include digital linear tape and 3480/90 cartridges. The Norsam Corporation has developed a long-term/archival storage medium called HD-Rosetta. It uses ion beam technology to etch both eye-readable and bit mapped digital images (TIFF or PDF) onto stainless steel 2-inch discs.

Magnetic media includes hard disks and magnetic tape. Access to information on a hard disk is direct and speedy. Access to it on magnetic tape, where one record follows another, is sequential and relatively slower. All magnetic media are adversely affected by excessive heat, humidity, and gas pollutants. An environment of 10 degrees C (50 degrees F) with a relative humidity of 25 percent and air filters would provide the best storage conditions for magnetic media.

Optical media come in three basic types: ROM (read only), WORM (write once), and RW (re-writeable).

 Storage conditions

4: Portability and persistence
Electronic records that have portability and persistence are those that have been maintained so they can be used, preserved, and accessed over a long time. The primary impediments to long-term preservation and access are technological obsolescence, fragile storage media, and hardware/software dependence. Two alternative strategies for portability and persistence are emulation and migration.

Emulation was developed by Jeff Rothenberg of the Rand Corporation who describes it as a "process in which one computer is used to reproduce the behaviour of another computer with such fidelity that the emulation can be used in place of the original computer." Still considered theoretical, his strategy supports executable "digital originals" and requires the native application and emulator of the original platform.

Migration strives to ensure usable and trustworthy electronic records for as long as necessary without regard for platform. It converts electronic records to technology-neutral file formats and requires backward compatibility. Migration preserves the processibility of records but potentially risks losing the "look and feel" of the original format and some original information.

Guidelines for selecting a file format