Digital Data in Biodiversity Research

Last week I attended the Inaugural Digital Data in Biodiversity Research Conference sponsored by iDigBio, the University of Michigan Museum of Zoology, the University of Michigan Herbarium, and the University of Michigan Museum of Paleontology. The conference brought together biodiversity researchers, data providers, data aggregators, collection managers, and librarians. We talked about creating digital biodiversity data, sharing this data and using it in research.

The presentations, posters and workshops highlighted research trends in biodiversity and projects that have open access missions similar to BHL’s. I was able to give a talk about using statistical analysis to calculate the size of biodiversity literature and present a poster about visually representing the collection at BHL. 

Continue reading

The Role of Librarians in Wikidata and WikiCite

 

Screen Shot 2017-06-05 at 4.31.53 PMThe other week I participated in WikiCite 2017, a conference, summit, and hackathon event organized for members of the Wikimedia community to discuss ideas and projects surrounding the concept of adding structured bibliographic metadata to Wikidata to improve the quality of references in the Wikimedia universe. As a Wikidata editor and a librarian, I was pumped to be included in the functional and organizational conversations for WikiCite and learn more about how librarians and GLAMs can contribute.

The Basics (briefly and criminally simplified)

Galleries, Libraries, Archives, and Museums are institutions that collect, preserve, and make available information artefacts and cultural heritage items for use by the public. Before databases librarians managed card catalogs to facilitate access, which were translated into MAchine Readable Cataloging (MARC) data format digital records to create online catalogs (ca. 1970s-2000s). As items in collections are being digitized, librarians et al. add descriptive, administrative, and technical/structural metadata to records and provide access to digital surrogates via digital library or repository, depending on copyright. Metadata, however, is generally not subject to copyright and is often published by GLAMs for analysis and use via direct download, APIs, and in more and more cases, as Linked Open Data. As a field, we’re still at the beginning of this transformation to Linked Open Data and have significant questions still to answer and thorny issues still to resolve.

Screen Shot 2017-06-05 at 4.38.32 PM

Diagram of a Wikidata item  https://www.wikidata.org/wiki/Wikidata:Introduction

Wikidata is a source of machine-readable, multilingual, structured data collected to support Wikimedia projects and CC0 licensed under the theory that simple statements of fact are not subject to copyrightWikidata items are comprised of statements that have properties and values. In the Linked Open Data world these items are graphs with statements expressed in triples. As Wikimedians and Wikidata editors add more of this supporting structured data to Wikipedia, the idea of adding bibliographic metadata to Wikidata started coming up. Essentially – “Here are some great structured data that are incredibly important to the functionality of Wikipedia; how can we add them to this repository that we’re creating in a useable way?” As many librarians (and really anyone that’s written a substantial research paper) are aware, citations are complicated.   Continue reading

Getting to Know the BHL Users

For my NDSR project at Smithsonian Libraries, I’ll be gathering feedback from the users of BHL to help inform the next version of the digital library. I’ve had the opportunity to meet with several partners of BHL and sit in on BHL Member, Collection, and Tech Team meetings. Through these interactions, I’ve been able to identify three main groups to solicit feedback from for my research: (1) Consortium Users; (2) System Users; and (3) Individual Users.

1. Consortium Users:  A contributor to BHL including Members, Affiliates, Partners, staff, and volunteers

2. System Users:  Organizations or individuals who interact with BHL for the purpose of enriching another system via APIs (Application Programming Interface) or manually

3. Individual Users:  Anyone visiting the BHL website to search for information to answer their research needs such as, scientists, collection managers, librarians, etc.

Consortium Users

As a consortium of natural history and botanical libraries BHL is made up of Members and Affiliates. Each of these consortium users are committed to the mission, vision, and key values of BHL centered around free and open access to biodiversity literature.

Example of digitized content found in BHL. Catesby, Mark. The Natural History of Carolina, Florida, and the Bahama Islands. v. 1, ed. 1. pl. 16. Digitized by Smithsonian Libraries. http://biodiversitylibrary.org/page/40753165

Staff at these partner institutions participate in various ways including scanning their biodiversity resources to be added to the BHL content and taking part in BHL working groups and committees as needed. As I’ve sat in on the meetings for some of these committees, I’ve been able to learn more about the BHL Members and Affiliates and their needs. The majority of these members are museums or libraries serving their own users as well. Members are looking for ways to streamline the process for digitizing their materials into BHL and promoting and accessing their content through BHL.

Another important group are volunteers including volunteers through Member institutions or with BHL as a whole. They assist with scanning and uploading content, managing social media, tagging illustrations with taxon names, and participating in crowdsourced transcription efforts, among other activities. These endeavors increase the visibility of and enhance the content in BHL. Check out a couple of BHL’s most active volunteers on social media – Siobhan Leachman and Michelle Marshall and her Historical SciArt.

Continue reading

DPLAfest and NDSR Symposium

This post is brought to you by the BHL NDSR Cohort. I, Alicia, introduce our conference packed month of April. Next, Ariadne recaps our DPLAFest presentation followed by Pam’s overview of our NDSR Symposium panel discussion. Lastly, Marissa and Katie offer some feedback and reflections from our first round our presentations.

April was a busy month for all of us residents! We attended and presented at two conferences in two different cities: first, at the 4th annual DPLAFest in Chicago and then the NDSR Symposium in Washington D.C. the following week.

IMG_5583

L to R: Ariadne, Pam, Marissa, Katie and Alicia at DPLAFest 2017.

DPLAFest is organized by DPLA, the Digital Public Library of America, which provides free, digital materials from America’s libraries, archives, museums and cultural heritage institutions. The network of DPLA is established on a “hub” model which brings together digitized and born-digital content from across the country to a single access point. BHL serves as one of the content hubs for DPLA which means BHL content gets passed along to DPLA. Our work with BHL connected mainly to the DPLAFest themes of digital libraries and open access content and collaboration across types of institutions.

Continue reading

Data Management at a Botanic Garden

The 385-acres of the Chicago Botanic Garden (CBG) could not be maintained without the work of dedicated staff, hundreds of volunteers, and careful data management. During my residency at CBG, my mentor, Leora Siegel, arranged an introductory meeting with the head of the Living Plant Documentation, Boyce Tankersley, to help me understand how the management of over 2.6 million plants is possible.

One of the few botanic gardens with AAM (American Alliance of Museums) accreditation, the Chicago Botanic Garden maintains records much like museums do, however, the collection items at CBG happen to be living (and thus can die, move, create new items, etc.). Each plant that enters the collection is given an accession number and deemed to be a member of the permanent collection or given “seasonal” status as a part of a temporary collection (like the orchids that were on view in the orchid show that closed at the end of March). This data is all managed through an internal database.

Continue reading

Why Transcribe?

Digitization is not a new activity for libraries and cultural heritage institutions, and indeed has become a critical tool for preserving and providing access to archival collections including rare books, manuscripts, and photographs. The potential research value of digitized collections is also not a new phenomenon. However, translating images of content into machine readable data that can be searched, sorted, and otherwise manipulated had not received much attention until crowdsourcing, citizen science, and other types of community collaboration models and platforms were constructed. A definition of transcription is useful to understand some of the competing elements when considering whether and how to transcribe digitized items. Huitfeldt and Sperberg-McQueen distinguish between transcription as an act, as a product, and as a relationship between documents.1 Cultural heritage institutions need to explicitly facilitate the creation and dissemination of each in order to host a successful transcription program. While crowdsourcing methods directly address the act of transcription, libraries are often better suited to produce viable representations of transcription products and relationships in digital repositories. Crowdsourcing thus becomes one of several methods or tools for libraries to develop successful transcription workflows.

Screen Shot 2017-04-07 at 11.38.52 AM.png

Image from William Brewster’s Diary from 1865 that identifies several birds by their common species names (http://biodiversitylibrary.org/page/40222552).

Transcription helps bridge the gap between digitization and use by enhancing access through full text search, enriching metadata collection, and opening collections to digital textual analysis. Digitized natural history manuscript items are largely hidden due to the lack of item level description for most archival collections. While minimal processing is certainly the better option compared to maintaining an extensive backlog of unprocessed material, digitized handwritten documents are not discoverable based on their unique content without a machine readable facsimile. Indexing transcriptions facilitates discovery of historical records and improves catalog search results. By offering full text transcriptions, the digital collections are opened up to new types of searching, sorting, categorizing, and pattern finding. Research derived from these new data sets can illustrate changes over time across much larger magnitudes of collections and types of information resources.  Continue reading

Reflecting on Open Access and Code4Lib 2017

In considering how to consolidate my thoughts from Code4Lib 2017, I spent some time reviewing the pre-conference workshops and the interesting and directly relevant talks from last week. Ultimately, as I am sure many other attendees discovered, I found that the framework of the conference and a lot of our work as library technologists was best examined by Christina Harlow in her keynote “Resistance is Fertile.”1 There were many (many) other presentations and discussions throughout the conference that were inspiring, enlightening, and compelling, but Harlow synthesized the meaning behind what we all do and applied to it a language and a methodology for doing it better.

And it was remarkable. I think people even cried a bit. We all stood up at the end and clapped a lot.

applause

And over the next few hours and days I thought about how BHL and my position as an NDSR resident fit into this framework and how I can be an agent who advocates for not just Open Access to content but also its ethical and operational background. Harlow keenly argues for investigating the transparency of library policies if not to resolve inherent biases in programming, systems architecture, and design then to encourage further democratizing the “means of production” (of datasets, of metadata, of documentation) in pursuit of accessibility and true openness.2 Continue reading

Workshop Roundup

In the past month, I’ve been able to participate in several workshops relevant to my project and to BHL in general. On February 15th I attended a training session on the Expanding Access to Biodiversity Literature (EABL) project presented by Mariah Lewis of the New York Botanical Garden. The workshop was held at the Los Angeles County Arboretum and Botanic Garden, one of BHL’s newest affiliates.

IMG_20170215_093337953_HDR

Arboretum grounds

Mariah gave an overview of the EABL project and of BHL’s collection development policy, digitization overview, and copyrights and permissions workflows. This was a great refresher from the BHL Bootcamp at the Smithsonian Libraries, and it was good to confirm that I have a solid grasp on how BHL works. Continue reading

“BHL Bootcamp”

_dsc0939

BHL NDSR Mentors, Residents, and Secretariat staff breaking the ice at the start of the “Bootcamp.”

About the “BHL Bootcamp”

Ariadne Rehbein

From February 1-3, the BHL NDSR Mentors and Residents converged on the Smithsonian Libraries for “BHL Bootcamp.” In addition to the technology, administration, and mission of BHL, Residents were introduced to the culture of BHL, NDSR, and leading research institutions first-hand. Immersion workshops are a time-honored tradition among NDSR programs, though their curriculum and timeframes vary. Our workshop had the most compact timeframe to date, but we took advantage of the ample opportunities to get to know one another and ground ourselves in the history, practices, mission, and aspirations of this collaborative digital library. The workshop comprised of lecture-style instruction by BHL Secretariat staff, tours, and a networking event. Continue reading for a gauntlet of notes and commentary from each of the Residents! Carolyn Sheffield took a leading role in organizing the workshop, and has also outlined its events for the BHL blog.

Continue reading