A Prospectus for Electronic Historical Editions
Table of Contents
A Prospectus for Electronic Historical Editions
May 30, 1996
The Model Editions Partnership is a consortium of seven historical editions which joined forces with leaders from the Text Encoding Initiative and the Center for Electronic Text in the Humanities. The first draft of the "Prospectus" was developed during a meeting of the Partnership's Steering Committee which includes representatives from each of the editions as well as the co-coordinators of the project. Funded by a major grant from the National Historical Publications and Records Commission, the project's purpose is to develop a foundation for the next generation of historical editions. That generation will consist of electronic editions disseminated via the Internet or on CD-ROM (or its equivalent). The goals of the project are:
In this document, we move toward the Partnership's first goal by identifying the principles which should govern those editions, exploring what electronic historical editions will look like, and outlining an approach to developing an intellectual framework to implement the editions. The "principles" are intended as general guidelines for designing electronic editions of historical documents. The "models" provide a means for actively thinking about how such editions might look and work. Both will guide us as we create the series of sample editions which will be disseminated via CD-ROM and the Internet. These views reflect the consensus of the members of the Partnership, and are informed by suggestions from the editorial community, from a panel of editorial consultants which reviewed the prospectus before it was completed and from other constituencies in the electronic text community.
Although historical editions share many common characteristics, each edition is unique. Letters, diaries, speeches, newspaper accounts, pamphlets, public records, and many other types of documents challenge the editor's skill and imagination in presenting representations of the texts which today's readers will find understandable and useful. Just as in today's book and microform editions, it is the editor who establishes a reliable text, provides the carefully chosen commentary necessary to understand that text, and creates the indices and other editorial devices which provide intellectual access to that text and commentary. Electronic editions will be no less challenging than creating today's book and microform editions, perhaps even more so. In moving forward to meet that challenge, the Partnership has developed a set of general principles which we believe should govern the design of electronic editions.
Electronic Historical Editions should:
Electronic historical editions must be designed to allow editors to implement appropriate transcription policies; to provide relevant annotation and commentary; to select subsets from a larger body for more intensive editorial treatment; and to provide intellectual access through control files, indices, and other editorial devices. These are fundamental practices in today's editing community; they must not be constrained by the shift to electronic editions. The hallmarks of an electronic historical edition are: 1) accurate and reliable representations of one or more physical or intellectual levels of the text; 2) supplementary materials which make that text more easily understandable; and 3) provisions which make the text and supplementary materials accessible.
Electronic editions must also be designed to permit changes in editorial practice and interests. The illustrations set forth later in this document point to a number of ways in which editorial practice might differ in an electronic environment. But those illustrations are drawn from limited experience which will surely be surpassed as more and more scholars become involved in creating historical editions in the electronic environment. Though it is impossible to anticipate fully what editors will want to do, we can in fact adopt the kind of "open architecture" discussed below which will accommodate changes in editorial practice.
Electronic editions should also allow post-publication enhancement of the editions. Post-publication enhancements might include supplements of newly discovered documents linked to the original edition, correction files linked to the edition, new editions linked to older editions (perhaps even sharing documents), or new auxiliary materials linked to the edition. One could imagine that subsequent generations of editors and scholars might wish to use subsets of the original documents in an edition to fashion a selection of documents for classroom use. To accomplish this also requires that electronic editions conform to relevant standards which provide an "open architecture."
Electronic editions should be constructed with a view to multiple forms of publication, including publication in print, on CD-ROM or similar electronic medium, and over telecommunications networks. The same electronic format (markup) should support each of these publication forms. In general, a single electronic format should be chosen for the master archival copy of the edition from which published versions can be derived by automated means (i.e., by software). The published electronic edition need not be identical to the archival or master electronic form. This principle should be construed neither as requiring the publication of an edition in its archival form nor as forbidding such publication. For some projects, publication of both complete (comprehensive) and selected editions will make sense.
Electronic editions should use standard non-proprietary formats (markup) for the representation of text, images, and other material. Standard formats, such as SGML, for example, are essential if editions are to remain usable despite rapid changes in computer hardware and software. Non-proprietary standards are essential if editions are to be used with a wide variety of hardware and software. International and national standards issued by recognized standards bodies should be preferred to de facto standards because such organizations guarantee standards based on a consensus of all interested parties. At the current time, this means the use of a markup design like the Text Encoding Initiative Guidelines formulated under the Standard Generalized Markup Language architecture, adopted in 1986 by the International Organization for Standardization (ISO 8879). Relevant standards for images and other material have yet to be selected for the Partnership models.
Tomorrow's editors will no longer be constrained by the physical format of the book or microforms. But not everything changes. The editor's basic role does not change. Nor do scholarly objectives. While the electronic environment provides new ways of presenting and providing the context for historical documents, it is the scholar-editor who will create an intellectual framework to provide accurate documents which are understandable and intellectually accessible. Using the capabilities of the electronic environment, the editor can create electronic editions that go far beyond today's book editions.
Instead of imposing a single representation of the text, an edition can allow users to choose from multiple representations ranging from clear text to the most conservative diplomatic transcription. Henry Laurens' letter to Thomas Fletchall, a backcountry leader of wavering loyalties on the eve of the Revolution, is filled with cancellations and interlineations. Attempting to bring Fletchall to the patriot cause, Laurens walked a verbal tightrope, appealing to Fletchall's patriotism while setting forth the consequences of remaining loyal to the Crown. The Laurens edition tries to reflect the process of inscription by rendering all cancellations and interlineations explicitly. However, many scholars would prefer to read and quote from a clear text version of the document in which cancellations and interlineations are passed over in silence. The ability to switch from diplomatic text to clear text is easily provided by an electronic edition. Textual issues may be further clarified by including digital images of the original document, though editorially-prepared transcriptions often provide a more understandable representation of a text when it has been heavily emended and borders on illegibility.
Instead of relegating variant readings of a text to footnotes or to back-of-the-book tables when multiple copies of a document exist, an editor can embed the variants within the copy-text chosen by the editor so that readers can view or suppress the alternate readings as their needs dictate. When John Adams drafted his letter of August 18, 1782, to Robert R. Livingston, Adams criticized Benjamin Franklin for intimating that recognition of American independence was not a precondition for opening negotiations with the British. Adams did not include the critical paragraph in the final version of his letter, a fact that historians did not become aware of until Richard B. Morris discovered the draft. Variants at this level of importance may be more easily discernible when rendered within the text itself rather than in the editorial apparatus of an edition.
Instead of being limited to a single organizing principle like chronology, an edition can allow readers to dynamically organize the documents into subsets relating to their own interests. The Lincoln Legal Papers consist of more than 250,000 documents digitized from photocopies of the originals. The editors organize the documents relating to Lincoln's law practice by case, and maintain control through a master case file which allows them to track cases heard in several courts. The editors also have extracted information from the documents which will allow users to create subsets of the documents based on a variety of factors including cases of a particular type, cases in a particular time frame, cases about particular subjects, cases involving particular people, as well as many other arrangements which can be based on a combination of factors defined in the control files. Thus, readers will be able to construct questions which pull together documents from the whole edition.
Instead of annotation which occurs only once on a single page, an electronic edition can allow annotation to be linked to many documents. Notes identifying the cast of characters in an edition are commonplace in letterpress editions. In long-running series where the cast continues volume after volume, biographical notes are not always repeated and the user has to turn to earlier volumes to find them. The same situation occurs when important subject matter spans a number of volumes. In an electronic edition, links between notes and documents or between notes and notes can allow the user to follow threads of information easily and quickly. As one editor noted: "annotation can serve the whole edition." One could also imagine editions with flexible access to annotation geared toward different levels of education, from grammar school to high school, from undergraduate research to scholarly inquiry. Thus, a single edition could be tailored to serve the different needs of a variety of users.
Instead of a fixed index or set of indices at the back of a volume, an electronic edition can build cumulative indices as units of the edition are completed. Projects sometimes create a new cumulative index as each volume is published, but scholars will not have access to those indices until the final volumes of those editions are published. Traditional publishing of successive accumulations would be too expensive. In an electronic edition, an editor could substitute successive versions of a cumulative index--a simple swapping of files. Additionally, users can construct their own searches to locate information not covered in the indices.
Instead of the limited numbers of illustrations now prescribed by costs, an edition can easily and inexpensively incorporate maps, drawings, and other images which when linked by pointers to documents would enhance an understanding of the documents. Susan B. Anthony's diary entry of January 10-12, 1855, chronicles local women's rights conventions held in the counties of western New York. The reader has only to look at a map to get a sense of the distances Anthony traveled in that brief span to understand her comment "but we were much too tired to think of it--had been broken of our sleep so much for the week past--"
Instead of a letterpress edition containing large numbers of abstracted documents and a separate microform edition giving full transcripts of all documents, the two might be linked. A project could link the abstracts, with the analytical index and the annotations, to the full transcripts or images of the original document. A user could search for a subject in the index, find that it is in an abstracted document and then bring up the transcript or image to see the full text in its original form. With proper tagging, users can quickly display related documents, such as draft versions and enclosures, enabling readers to look up documents which when published traditionally would have been located in different reels or volumes.
The examples set forth above are but a first step toward outlining how electronic editions may differ from current print and microform editions. The list is suggestive--not exhaustive. It sets forth themes and approaches we will demonstrate in the sample editions within the Partnership. A different set of editions would undoubtedly bring to light different editorial considerations.
Markup is (for our purposes) the set of conventions used to represent text, images, and other data in electronic form. Editors have used markup for years, both in paper form and in electronic form. In the traditional copy-edited manuscript, the copy editor marked the text and included marginal instructions to the typesetter. In more recent electronic manuscripts, historical editors have provided generic or software-specific codes for driving a typesetting machine in producing printed pages. The selection of a markup scheme is critical to electronic editions because, in the long run, markup determines what the editor and the user can and cannot do conveniently with the electronic edition. If the markup is ill chosen, the edition will be unable to exploit fully the opportunities offered by the electronic medium. Just as universally accepted proofreading symbols allow us to describe textual features, we need to move toward universally accepted markup for electronic editions to describe those and other features of interest and value.
In electronic editions, markup can be used not only to control the presentation (formatting) of the text, but to make possible new ways of annotating, accessing, and indexing the text. Well-designed markup allows editors to link their documents to explanations of events, or to biographical references, or to images like maps which help users understand the documents. Good markup allows editors to identify features of the text which will be useful in retrieving information. For example, if the markup identifies the author, recipient, and date of each letter, it becomes possible to generate a selective list with links from the listing to the texts of the documents. If the markup also identifies the names of individuals within each letter, the listing could be further refined to exclude all letters except those referring to a particular person.
Good markup allows editors to create indices drawn from the markup itself. If the markup carries the regularized form of a person's name, a person index can be extracted automatically and later used as the basis for a more refined index. Draft indices drawn from markup would also be useful in making editorial decisions about annotation and refinements in the selection process which are often made during the annotation phase. In short, a properly designed markup scheme can enhance the tools editors now use and provide new opportunities in building the intellectual frameworks for electronic editions.
As we look toward the creation of digital libraries, one could imagine an era in which electronic historical editions would have links to standard reference works like the American National Biography, to shared sets of maps and gazetteers, and perhaps even to shared annotation. To do so, however, the first step is for editors to adopt common methods of preparing electronic editions. The next step would be to identify limitations in current software so that we can encourage vendors and others to develop the tools needed to carry out our objectives.
Historical editions in the electronic environment can be divided into three models:
To assure continued access to existing scholarship, a fourth model may also be appropriate:
Microform editions have long made available images of original documents coupled with finding aids ranging from simple descriptions of document groups to well-developed subject indices. Digitized images of original documents coupled with control files and supplementary materials will go far beyond the traditional microform edition. Control files in an image edition will typically allow the user to define subsets of documents, tailoring his or her view to items of particular scholarly interest. Image editions will allow the user to enlarge or enhance the images for clearer viewing. Perhaps more striking will be the editor's ability to add supplementary material. In addition to images of documents encompassed in three microfilm series, the Margaret Sanger Papers image edition will include searchable editorial essays as well as headnotes linked to each document. The project will then "annotate" these headnotes by linking, for example, personal names to biographical descriptions and date entries to a day-by-day chronology of Sanger's activities. Other planned links include providing on-line access to copyright information and repository addresses.
Letterpress editions present carefully verified transcriptions of historical documents accompanied by a variety of editorial devices to make the historical context understandable and accessible. Similarly, live text editions will present transcriptions as searchable text and are likely to be accompanied by the same kind of scholarly material currently found in letterpress editions. However, as indicated above, a live text edition offers many opportunities for editorial enhancement not available in book editions. With appropriate markup, an editor can create retrieval opportunities which go beyond the level of access control files provide in an image edition, making it possible for users to define subsets of the documents for viewing which have not been anticipated in the control files. At the same time, in a text in which quotations were marked, a reader can retrieve all of the letters in which Nathanael Greene quoted from incoming military intelligence by specifying conditional searches of the text itself. In an edition of the Documentary History of the Ratification of the Constitution and the Bill of Rights, a researcher could pull together all of the letters written between September 1787 and January 1788 in which the concept of slavery is mentioned or alluded to if the text is appropriately prepared. Electronic text can also serve as the basis for new analytical studies (e.g., authorial studies, content analysis, etc.) of materials not previously available for computer processing.
The combination of document images, searchable transcriptions, and scholarly apparatus for contextualization and access may well emerge as one of the new types of electronic editions. In the Internet exhibit on the Gettysburg Address at the Library of Congress, we can see the beginnings of a move in this direction. The exhibit includes images of two copies in Lincoln's hand with transcriptions. Although the exhibit does not include the type of scholarly material one would expect to find in a full-blown edition, the exhibit does demonstrate the value of providing both images and searchable text. The transcriptions are "clear text," while the images allow the reader to see the cancellations and interlineations in the copy believed to have been Lincoln's reading copy. Although it is possible with today's technology to link live text to images at the word level, page-level linking will probably be the norm except in rare cases where an editor feels that word-level linking is essential.
As a means of preserving the scholarship of letterpress editions and making it accessible in electronic form, a transitional model is suggested. In that model, scanned images of the printed pages would be combined with live indices which allow the user to retrieve the page images. Although this type of edition would not provide the full range of features found in a live text edition, it would make the material accessible via the Internet. Page image editions would be particularly useful in editions like those of the First Congress and Ratification projects where documents within the same time period are presently organized in overlapping series. A well-designed electronic edition would allow readers studying a particular topic to move quickly from volume to volume.
This model could also serve as a vehicle for integrating page images from the letterpress volumes with supplements of unpublished documents. The Laurens Papers and the Greene Papers have long had plans to publish supplements. Uniting the published page images with transcriptions of unpublished documents would provide a level of comprehensiveness simply not available in the traditional book-microform model. Making both the page images and transcriptions accessible through comprehensive indices would create an even more useful edition.
Although providing searchable texts of documents in the previously published letterpress editions would be preferable to page images, converting printed pages to live text is expensive. The National Digital Library team at the Library of Congress has experimented with a variety of methods for creating live text including optical character recognition vis à vis keyboard transcription. They currently estimate that the cost of creating text and encoding it at between five and six dollars per page. Conversion of the John Dewey volumes which was subsidized in part by an outside vendor was between one and two dollars per page but the text was encoded in a proprietary markup for cd-rom distribution. As optical character recognition advances, it may become a more feasible and cost-effective means of converting page images to live text.
The preceding sections present a set of generalized intellectual principles as well as a set of generalized models for electronic historical editions. Neither is meant to be constrictive. Rather, both are designed to provide broad umbrellas. The set of principles recognizes the tension which exists between the scholar's objectives and the constraints of current technology. The first principle, that electronic editions must accommodate current editorial practice, is not simply "first among equals." It is meant to be the guiding principle to ensure that editorial scholarship will not be fettered by technological considerations. The last three principles relate to the design issues for electronic editions. One could view them as a series of questions: Is the design of the edition flexible enough to accommodate supplements? Will the design allow an editor to publish in one format and then another? Does the design conform to relevant standards which ensure longevity and reliability?
The set of models is designed to give us a common frame of reference and make it easier to discuss the nature of electronic editions. Image editions are similar to microform editions; live text editions, to letterpress editions; combined editions, to selected letterpress editions based on comprehensive microform editions. But the operative word is "similar." The similarity is in how the text is prepared. In image editions and microforms we have representations of the physical text. In live text editions and letterpress editions, we have transcriptions of documents. But, as we argue above, the electronic environment will give us the chance to create new forms which are not possible (or at least, not practical) in microform or print editions.
The underlying assumption of the Partnership is that the SGML architecture offers the best option for creating electronic editions which will stand the test of time. To gain a thorough understanding of the editorial features of interest to each project, the coordinators visited each of their editorial offices in the summer of 1995. When the steering committee met in Columbia, S.C., in the fall of 1995, it also began the process of identifying textual features which need to be marked up in an electronic edition. During the next phase of the project, the coordinators will draft a markup scheme using a subset of the TEI Guidelines plus additional markup tags required for historical editions. Those "Guidelines for Markup in Historical Editions" will also be circulated to the editorial community and to others in the electronic text community for comment.
The choice of building on the TEI Guidelines to create markup for historical editions rests on several factors. The Text Encoding Initiative brought together scholars from many countries and many disciplines, all experienced leaders in computer applications in the humanities. This international effort drew support from the National Endowment for the Humanities, the European Union, the Social Science and Humanities Research Council of Canada, and the Andrew W. Mellon Foundation plus a broad array of support from universities. Out of that effort came a two-volume, 1300-page reference work, which allows most features of historical texts to be identified with markup. But because scholars look at texts in different ways, the editors of the Guidelines realized that no set of markup would meet every need. Given that, the Guidelines were designed to provide a basic set of markup which can be extended and modified to meet scholarly demands. The Guidelines do provide most, but not all, of the basic markup needed for historical editions. Building on a solid foundation, however, is better than reinventing the wheel.
Our decision to use the TEI-SGML markup was also influenced by recent developments in the electronic environment itself. SGML text is the heart of the World Wide Web, a portion of the Internet which is growing exponentially. In the scholarly world, almost every major project targeting the Internet is creating SGML text and many are using the TEI markup or are heavily influenced by it. This includes the Library of Congress and its allies in the American Heritage Digital Library, the pioneering electronic text centers at universities like Rutgers, Princeton, Virginia, Michigan, and California, and individual projects like the Perseus Project at Harvard and the Women Writers Project at Brown. If historical editions are to be part of tomorrow's digital libraries, extending the TEI markup is the logical path. But before we begin that journey, we need to agree on fundamental principles.
| Return to Model Editions Page |
This page was last updated 9 May 2004