Emily Meyers

A Digital History Portfolio

Class ExperiencesClio

Module 2: Digitization and Data

This week jumped right into the process of challenging what digitization is and what to do with the data collected. Opening a program called OpenRefine, I realized that I never liked the look of spreadsheets and learned how to organize data. All jokes aside, this program is great to find information that is available in the data set. In addition, historians can use the holes in the information to uncover more about that event or topic based on what is missing.

data set
This is a screenshot of the data set used for this project in OpenRefine

This is all interesting, but prompts many questions for these historians. What types of sources are there TO digitize? Who digitizes sources? Why? How does that shape what’s available online? How does a digitized source relate to its original? The answer to all these questions is… it depends!
Ok so historians can be hoarders so we want to keep everything (physical or digital) because we believe that it could be useful later. But this has incited another discussion on the data side as in Big? Smart? Clean? Messy? Data in the Humanities by Christof Schoch. Reading this, I realized how much I did not know about data in general and understanding its use in the development of the humanities and some of it went a little over my head the first time I read it. When I went back through to understand what Smart, Clean, etc data is, I was able to piece together how it benefits research in the humanities.

We also looked into the impact of digital sources in research. A Culture of non-citation by Jonathan Blaney and Judith Siefring, begins to ask why there are less cited sources in digital spaces. I think some of that has to do with the fact that we still view the internet smaller than it is. “Just Google it” tends to be a commonly used phase when we think something is easy to find and common knowledge. I came to the conclusion that, when addressing digital work citations, they should be similar to typed/paper work but simply use hyper links instead of footnotes. Or use footnotes. Just cite it so I can find it later Mean Girls joke that says "oh my god Karen, you can't just repost on the internet and not cite your sources"

Share this post

6 comments

  1. I think it’s really interesting that you bring up looking at what is missing in a data set to figure out different events. I had never thought about using data like that much before. Also an excellent point about citations, above everything else, you just want to be able to find it again.

  2. Emily, I also like your point of thinking about what is not in the data. That can probably tell you a lot and brings up other questions to ask.

    About citations, I think people may realize the unreliability of digital sources so they may not want to cite them because they could disappear, while a print source will always be available. Also, I wonder if there is still a stigma with digital sources because we have been taught how untrustworthy the internet is. I think this view has changed with all the scholarly online resources but we still have to take care to cite really trustworthy digital sources. It seems like in the past it was frowned upon to cite anything from Wikipedia but now it seems to be a little bit more accepted – not necessarily in a scholarly way but as a general resource or for an image. Maybe I am wrong about that…

    1. The stigma is a great point! I was taught that you cannot use Wikipedia under any circumstance in high school. Looking back on this, it was a little extreme and the expectation when I got to college was to use Wiki as a starting point of research and as a collection of sources. Not the best way to teach the issue that the internet can be unreliable at times but we got to a happy medium…. I think? haha

  3. I was also someone that was blown away by the differences between different kinds of data! To have this data be cleaned and sorted thoroughly, historians looking for sources are going to have a much better time in their research. There have been many times where my transposition of an old source hadn’t been completely correct that I wished I had the input of others! Excellent post and great use of images!

  4. I really enjoyed A Culture of Non-Citation, especially because it forced me to reflect on how exactly I was citing things. I am 10/10, 100% guilty of citing print versions of sources I found digitally, and honestly I can only pinpoint maybe one or two reasons why. One, definitely, was that it felt like there was more prestige in citing the print source over the digital (again, no idea why). Maybe another is that I was in the generation of kids that grew up with access to Wikipedia in high school. I’m convinced I still have a hang-up about citing digital sources from hearing the whole “what website counts as a source” spiel and, probably, internalizing too much of it.

  5. Great post. This week I also really thought about what could be data because sometimes I still automatically think about numbers, charts resulting form big data and quickly conclude that data is something I do not want to seek out when doing most work. As we discussed this week and as you mentioned in this post there are many types of data that can and sometimes should be used to answer some questions. But as you mentioned we all must think about what is missing from data and even brainstorm on possibly why it is missing. Could it be purposeful omission or human error not only when the source is scanned but when it is created and preserved. How does that affect the data we use.

Leave a Reply

Your email address will not be published. Required fields are marked *