• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.


Emily Dickinson Collocation Browsers

Page history last edited by Alan Liu 2 years, 2 months ago

Emily Dickinson Collocation Browsers

 By Angus Forbes


Did you ever read one of her Poems backward, because the plunge from the front overturned you?” –Emily Dickinson


Lisa Samuels and Jerome McGann, in their article "Deformance and Interpretation," take a seemingly whimsical fragment written by Emily Dickinson and read it at face value, asking how a reader can "release or expose the poem's possibilities of meaning" and show language as "an interactive medium." Poems they claim, citing Shelley, lose their "vital force when they succumb to familiarization." They imagine instead unfamiliar deformations of literature that offer "a highly regulated method for disordering the sense of a text." In so doing, they posit that the practice of "performative" critical models provides an important "anti-theoretical" interpretation not available to traditional interpretative criticism. 




My project investigates the 1955 edition of Emily Dickinson's complete poems through various interactive animated navigations of collocated words. As such, they perform what Samuels and McGann term "experimental analyses." Each of the visualizations displays a different presentation of her work. The first, Line Browser, allows a user to move the mouse pointer over a particular word in a line of a poem. Doing so brings up every other line which contains the same word. The user can then move the mouse pointer over new words in those line to explore new lines. The second visualization, Poetry Chains, begins with two words and attempts to find a chain of words in a specified number of line that connects them together, displaying them as it succeeds. The third visualization, Dispersion, allows the user to select a word from a list, where words are sorted by their overall frequency in the poems. Selecting a word brings up a dispersion plot of where the word appears within the full body of her poems. Multiple words can be selected, which provides a comparison of where the words appear in relation to each other. The fourth visualization, Collocation Net, begins with a single word centered in the middle of the screen. When the user selects the word, a random selected of its collocations pops out in a surrounding ring. Any of those words can be selected, which results in collocations of that word appearing. A user can toggle into an ambient mode of this visualization which automatically eventually cycles through all of the words, forever.




This set of visualizations offers a continuously dynamic remapping of Dickinson's work. The deformations present new opportunities for interpretation, some of which may lend themselves to successful insights, and other which might be ludicrous, or merely bland. Each of the visualizations performs this remapping in different ways. The Poetry Chain effectively runs a kind of smoothing operation, an averaging filter, by treating her entire corpus as a single poem. Additionally, it uses a depth-search algorithm to get between two points within the corpus, performing a non-linear "hopscotch" (with a poetic rather than narrative destabilization). The Collocation Net completely disassembles the corpus into individual words and links them together, not grammatically, but instead by a frequency metric which correlates words by the likelihood of their appearing together within the same line. While it is unclear what exactly the interpretive value of these remapping offers, it is interesting to think of them in relation to, or perhaps as a differentiation from, visualization projects utilizing the methods of information visualization or visual analytics. In those fields, it is assumed that the "raw data" is inherently atomic, and that the goal of the project is to enable users to recombine the data in different ways in order to  facilitate new revealing and new interpretation, or what Stuart Card calls "knowledge crystallization." That is, they allow the user to create models by the synthesis and analysis of data, through which hypotheses may be generated and then either validated or falsified. A recent article by Ben Shneiderman reframes the products of information visualization projects as creativity support tools, where the goal of such a tool is to facilitate creativity: novel ideas and new perspectives. That is, it is clear that in many ways there is an overlap of concerns between digital humanities computing projects and information visualization projects. They both aim to provide and evaluate new interpretation. A major difference of course-- and one of the central points of Samuels and McGann's article-- is that poems themselves are not composed of irreducible raw data. Instead, the meaning is the raw data. But this meaning lives in the interaction between the text and the reader, and can not be extracted, simplified, summarized, or evaluated in any direct way. 



A primary task in traditional Information Visualization projects is to provide different kinds of overviews so that the user can, at a glance, get a general sense of the data before investigating aspects of the data in more detail. For textual data, which is unstructured, and which has more nuanced, complex meaning, it may be too reductive to attempt to provide this kind of overview. Instead, overviews of textual works are provided in different ways: through biographical information, through a delineation of themes or topics, through summarization, or, most generally, by having the reader skim through the text. In lyric poetry however, where the themes are not easily explicable and no obvious narrative lends itself to summarization, overviews are necessarily interpretive acts. Through a series of ambient and interactive visual sketches, this project aims to provide the user with a loose overview of the language using a "skimming" metaphor, providing perhaps ad-hoc interpretations, and allowing the user to investigate further. A second primary task of Information Visualization projects is to allow user the opportunity to find more detail about a particular element or set of elements. A future task of these visualizations would be to allow users to the ability to view the poems themselves when desired, and to bring up further information about a particular word as desired. That is, it is important for a user to be able to contextualize the investigations offered by the visualizations, and to have more control over the interpretive process. 


In developing these sketches, the first step common to all of them was to parse the data into a format which enabled the visualizations. Each line of each poem is connected to all of the words in the line, and vice versa. And all of the words have a pointer to each word that it is collocated with, as well as how many times it is collocated with that word. Additionally, the total frequency of each word is recorded, as well the rank of the commonness of the word. Using only these data structures, stored in memory, were sufficient for three of the visualizations. The Poetry Chain visualization required slightly more involved processing. In order to find a connection between two words, the software performs a depth-first search, recursively scanning lines connected by collocations until a line containing the target word is found. Because this can be a time-consuming process, I limited the number search by removing every word and every line that the search encountered in order to dwindle the possible pathways it searched, and also to prevent infinite loops during the search. Because of this fairly crude filtering process, it is possible that a path will not be found. In this case I simply re-run the search. In general, a full path between the source word and the target word is found relatively quickly, usually on the first attempt. I also improve the chances of generating a successful path by reversing the source and target words if the the target occurs less frequently than the source (and reverse the path when it is complete). Occasionally, when both the source word and target word are infrequent it may take more than fifty attempts to find a path, in which case the search simply gives up and the user is asked to try a different source and target.


It is of course not known what exactly is the usefulness of incorporating statistical techniques common in linguistics and information visualization and what is the efficacy of using interactive visual presentations to illustrate textual interpretations. These four visualizations aim not to be operational methods, but rather incomplete prototype sketches of what aspects of such methods might involve.




Card, S. K. Information visualization. In The Human-Computer Interaction Handbook. CRC Press, 2007.


Dickinson, E.The Letters of Emily Dickinson. 3 vols. Eds. Thomas H. Johnson and Theodora Ward. Cambridge: The Belknap Press of Harvard UP, 1958.


Samuels, L., and McGann, J. Deformance and Interpretation.New Literary History 30, 1 (1999), 25–56.


Shneiderman, B. Creativity support tools: accelerating discovery and innovation. Communications of the ACM 50, 12 (2007), 32.









Comments (0)

You don't have permission to comment on this page.