A controlled vocabulary is an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching. It typically includes preferred and variant terms and has a defined scope or describes a specific domain.There are a variety of thesauri with which to describe material. Library of Congress subject headings and the Getty vocabularies are probably the best known, but there are a variety of thesauri with which one could do subject analysis.
Organizing and labeling the world for the sake of information description and retrieval is not, on its face, a terrible thing. It seems advisable to be able to find all of the books on a particular topic by doing a single search, and controlled vocabulary allows you to do that.
Controlled vocabulary often falls short. An innocuous example: Until June 2010, the subject heading you would need to know to find a cookbook was "cookery." This post does a great job of showing how the subject headings were structured when the change from "cookery" to "cookbooks" took place. Before that time, if you typed in "cookbooks," the catalog would redirect your search to "cookery" and you would be dropped into a results screen. It was annoying for catalogers and for users.
One of the more damaging examples of controlled vocabulary falling short happens when we attempt to describe people. One of my favorite articles is from Amber Billey, Emily Drabinski, and K.R. Roberto. In "What's gender got to do with it: a critique of RDA 9.7," they argue against coding gender in authority records. The argument is best summed up this way by Billey et. al:
When RDA requires catalogers to select from only two gender categories—male or female—the rules affirm ideas about gender as a binary and innate characteristic, something it is always possible to know, and know completely, about an individual. Indeed, “unknown” is listed as the only possible third option. The rule fails to account for those who know their gender, but experience it as outside the bounds of simply male or female (417).Basically, it's problematic to ask someone to make assumptions about someone else's humanity. And it's even more problematic to ask someone to codify these assumptions in authoritative sources.
Let's take a brief detour to talk about faceted search. Facets allow users to narrow down a set of search results based on a set of criteria extracted from elements in the records. Among other things, users can narrow by date of publication, format type, and subjects. Let's go back to our cookbook example. If you search for cookbooks, you could use the "publication date" facet to narrow down your search results to cookbooks published last year.
The Library of Congress is piloting a project called Library of Congress Demographic Group Terms, or LCDGT. The rationale for the project is that including these demographic terms in records allows users to find material for, or by, a particular group of people. The examples they use in the description of the program are "novels by lawyers" and "handbooks for nurses."
Seen through the lens of the Billey, Drabinski, and Roberto article, one can see how this project might be problematic--especially when it comes to assigning demographic information like gender or sexual orientation.
The pilot documentation states that:
In order for a demographic group to be proposed, a creator or contributor of a resource must self-identify with the group, or a resource must clearly indicate that the group is the intended audience. Furthermore, research in standard reference sources should be carried out and documented.I think that the self-identification piece embedded in the LCDGT project is the place where it has the potential to diverge from the problematic practice of recording gender in authority records. If a person self-identifies with a group, catalogers are not foisting an identity upon someone based on contextual clues. And multiple terms can be put together to describe demographic groups. But I do think that a set of controlled terms, no matter how broad, will not be able to keep pace with the spectrum of terms a group of people uses to identify itself. And, while the terms in this particular thesaurus "should reflect current usage, and pejorative terminology should be avoided," LCSH has been notoriously bad at keeping up with the current usage of terminology.
I think it's important to weigh the value of being able to find material by, and for, certain demographic groups against the dangerous act of describing demographic groups based on contextual clues. Would it be great for users to find information related to the demographic groups to which they belong? Probably. Is doing so worth the possible mis-identification of people and resources? Probably not.
The arguments for, and against, LCDGT are nuanced. And despite my best efforts, I am sure I've fallen short. The comment period for the pilot of LCDGT is open through June 5th. I will be curious to hear the reaction of the cataloging community and to see the progress of the pilot.