Presentation Roundup

Hello again from the BHL NDSR cohort! It’s hard to believe, but we only have about six weeks left in our residency. We’ve been busy reviewing our research and data to finalize our projects and recommendations for BHL this Fall along with presenting our work. Here are some updates on our recent presentations:

DLF Forum

Marissa and Pam here! We had the opportunity to attend the DLF Forum in Pittsburgh, PA at the end of October. Our proposal, “An Evolving Portal: Planning for the Future of the Biodiversity Heritage Library” was accepted into the User Experience Panel session.





Marissa (left) and Pam (right) presenting “An Evolving Portal: Planning for the Future of the Biodiversity Heritage Library” at the DLF Forum 2017.

Continue reading

User Survey Update

First, I have to give a big THANK YOU to everyone who participated in the recent user surveys for BHL. I really appreciate everyone’s willingness to tell us more about your experience using the site. This feedback will be incredibly useful for BHL as they look to move to their next version.

For my project, I categorized BHL users into three groups as I described in my earlier blog post including Consortium Users, System Users, and Individual Users and created a customized survey targeted to each user group. I used the SurveyMonkey platform for all of the surveys, and they were each live for three weeks.

user definition 110317.png

Continue reading

NDSR at iPres 2017

Last month I had the privilege of representing NDSR and BHL on a global platform. I attended and presented a poster at iPRES, the major international conference on the preservation and long-term management of digital materials hosted at Kyoto University in Kyoto, Japan.

The official theme of iPres 2017 was, “Keeping Cultural Diversity for the Future in the Digital Space–from Pop Culture to Scholarly Information” and presentations covered everything from strategies for preserving ancient Chinese caves to challenges of preserving augmented reality games such as Pokemon Go. The unofficial mantra of the conference could be “digitization is not digital preservation.” While digitization is an important step in preserving culturally and historically important artifacts, it is not the end of the preservation lifecycle. Sustaining digital objects for long term preservation remains a challenge for professionals in this field, and iPres gives us the opportunity to share lessons and ideas with each other in order to be better stewards of digital items.


Kyoto University. 

Continue reading

Preliminary Recommendations from NDSR Residents

Hello! We’ve been focusing on transforming our research into recommendation outlines that we presented to the BHL Tech Team last week. As we head into the final quarter of our residencies, we’ll be focusing on tweaking these ideas, developing workflows and proofs of concept, and finalizing our recommendations in a Best Practices White Paper by December. For this update, we wanted to give a preview of what some of these recommendations will look like and invite some preliminary feedback from the BHL NDSR Blog-o-sphere that we can consider as we move into these final months.


NDSR Mentors and Residents at Missouri Botanical Garden for BHL Tech Team meeting. Photo by Martin Kalfatovic. 

Continue reading

Digital Directions 2017

Last week, the Northeast Document Conservation Center (NEDCC) hosted its annual Digital Directions conference in Seattle, WA. The conference focuses on the creation and management of digital collections, and as one of my goals during my time as a Resident at the Natural History Museum of Los Angeles County (NHMLAC) is to create a project plan for digitizing materials, this seemed like a great place to get a foundation in the process. It also so happened that Seattle would experience the solar eclipse with 92% totality, which was an added bonus!


Digital Directions attendees ready for eclipse-viewing! Photo by the NEDCC via Twitter.

Continue reading

Wikidata and BHL Update: Part 1

This is a fairly incomplete post about the work that’s going on regarding adding BHL bibliography metadata to Wikidata. I hope to have several more of these posts before the end of the year! 

Following some productive conversations on donating BHL bibliographic metadata to Wikidata, it was discovered almost immediately that BHL’s data is not terribly useful without some serious munging. One of the biggest problems with BHL bibliographic metadata is that it comes from lots of different libraries and museums, legacy cataloging systems, and various types of authority work. For example: BHL attaches Creator IDs to Author names, which is useful for identification and connecting titles and items to their Authors, but they are assigned automatically according to the character strings imported from specific fields in a library catalog’s MARC record. Despite (and perhaps because of) the use of varying authority files to control Author name strings in institutional catalog records, different libraries have contributed items by the same author whose names are are spelled, punctuated, and identified differently. BHL does not conduct authority control on BHL metadata, choosing instead to focus on improving access to items based on content rather than metadata. Fortunately, there are several different ways to go about reconciling and disambiguating data, and one of them is crowdsourcing.

BHL can use Wikidata to tell its users that “Packard, Alpheus S” (Creator ID: 82636), “Packard, A” (Creator ID: 59850), “Packard, A S” (Creator ID: 48286), “Packard, A. S. (Alpheus Spring), 1839-1905” (Creator ID: 1592), and “Packard, Alpheus Spring” (Creator ID: 56087) are all the same person without editing the spelling or legacy metadata from the catalog record.

Screen Shot 2017-08-17 at 4.24.11 PM

Dr. Packard’s Wikidata Item viewed in Reasonator

One way is to use Wikidata as an identifier by adding a property for a BHL Creator ID in Wikidata (P4081) and adding a table in BHL for Wikidata Identifiers that can be associated with those same Creator IDs. By adding identifiers to Wikidata, it becomes a more robust knowledge base that will improve the discoverability of BHL’s content by enriching its metadata externally and solving some metadata problems internally. While some of the reconciling can be done computationally using (still more) authority files, it often misidentifies strings and isn’t very helpful when an author is not in that particular database. These errors are best caught by humans, who WIkidata invites to directly edit mistakes and add identifiers. By adding Creator IDs to Wikidata and in turn adding Wikidata IDs to BHL, BHL can leverage the wisdom of the crowd to reconcile its author metadata.

In order to test this idea and attempt to start down a path that will hopefully lead to more BHL data in Wikidata, I worked with Andy Mabbett (User:PigsOnTheWing) to add a representative set of 1000 BHL CreatorIDs to Wikidata; the first step of which was to disambiguate these authors. In order to procure a sample of 1000 representative authors, I used the rbhl R package to interface with the BHL API and pull a random sample of authors with associated DOIs.1 The rbhl package is an rOpenSci tool and can be found on their GitHub. The R script I used can also be found on GitHub at: . Once I was able to generate a table of Author Strings, CreatorIDs, an associated Title, and its DOI I headed over to OpenRefine to start reconciling BHL CreatorIDs. As you’ll remember from a few paragraphs ago, BHL doesn’t conduct authority control and relies instead on the work of partner institutions. This means that there are no external identifiers for authors in BHL. We chose to reconcile against VIAF IDs because VIAF has the most identifiers in Wikidata (for library resources at least). Once there were VIAF IDs, the CreatorIDs could be added as a P4081 property statement to author QIDs. The tool Mix n’ Match makes the part of this process that requires some human thought pretty simple and somewhat fun!2  

Now, my next steps are figuring out what that next steps are. There is some interest to add New York Botanical Garden’s herbaria type specimen to Wikidata along with protologue literature from BHL and perhaps field notebooks and other relevant collecting event items. BHL also has quite a long list of taxon names (3,732,986 names) with metadata for the pages they’ve come from. I don’t think it’s appropriate to push all of this data to Wikidata, but it is a significant dataset that could be useful in varying ways. Another issue is that resolving author strings to VIAF IDs is not an insignificant amount of work. Gerard Meijssen has brought up the idea of using Open Library IDs, which are already resolved to VIAF and often Wikidata, and which may be a solution. BHL hosts its content on the Internet Archive, which is the creator of the Open Library. One would imagine that is a simple hop, skip, and a jump from BHL CreatorIDs to OpenLibrary IDs, but I’m still investigating whether that is, in fact, the case.

Please jump in with any thoughts about Wikidata + BHL or what I’ve described above. I know that WordPress is not terribly conducive to discussion, but that’s how we’re set up for now. I do not claim to have an expert level grasp of Wikidata yet (or BHL for that matter), but this collaboration seems to be a constructive Open Data pursuit!

1. During this step I incorrectly assumed that BHL minted DOIs for all its content including individual articles. BHL does mint DOIs for monographs, and worked with BioStor to add 12,000 DOIs for articles.

2. The manual for using Mix n’ Match can be found at:


Last week kicked off the six day library extravaganza known as ALA Annual. The conference, hosted by the American Library Association, was held in Chicago, IL to discuss, learn, and exchange ideas about libraries on the theme “Transforming Our Libraries, Ourselves.” With 25,000 attendees, masses of sessions and talks, and a mountain of freebies, ALA can be an overwhelming experience — we managed to find our way, and wanted to share what we did and learned there.

One of our main goals was to present our “Halfway Remarks” poster on behalf of all of the BHL NDSR Residents. Alicia Esquivel of the Chicago Botanic Garden, and Ariadne Rehbein of the Missouri Botanical Garden attended and presented.

Continue reading

Digital Data in Biodiversity Research

Last week I attended the Inaugural Digital Data in Biodiversity Research Conference sponsored by iDigBio, the University of Michigan Museum of Zoology, the University of Michigan Herbarium, and the University of Michigan Museum of Paleontology. The conference brought together biodiversity researchers, data providers, data aggregators, collection managers, and librarians. We talked about creating digital biodiversity data, sharing this data and using it in research.

The presentations, posters and workshops highlighted research trends in biodiversity and projects that have open access missions similar to BHL’s. I was able to give a talk about using statistical analysis to calculate the size of biodiversity literature and present a poster about visually representing the collection at BHL. 

Continue reading

The Role of Librarians in Wikidata and WikiCite


Screen Shot 2017-06-05 at 4.31.53 PMThe other week I participated in WikiCite 2017, a conference, summit, and hackathon event organized for members of the Wikimedia community to discuss ideas and projects surrounding the concept of adding structured bibliographic metadata to Wikidata to improve the quality of references in the Wikimedia universe. As a Wikidata editor and a librarian, I was pumped to be included in the functional and organizational conversations for WikiCite and learn more about how librarians and GLAMs can contribute.

The Basics (briefly and criminally simplified)

Galleries, Libraries, Archives, and Museums are institutions that collect, preserve, and make available information artefacts and cultural heritage items for use by the public. Before databases librarians managed card catalogs to facilitate access, which were translated into MAchine Readable Cataloging (MARC) data format digital records to create online catalogs (ca. 1970s-2000s). As items in collections are being digitized, librarians et al. add descriptive, administrative, and technical/structural metadata to records and provide access to digital surrogates via digital library or repository, depending on copyright. Metadata, however, is generally not subject to copyright and is often published by GLAMs for analysis and use via direct download, APIs, and in more and more cases, as Linked Open Data. As a field, we’re still at the beginning of this transformation to Linked Open Data and have significant questions still to answer and thorny issues still to resolve.

Screen Shot 2017-06-05 at 4.38.32 PM

Diagram of a Wikidata item

Wikidata is a source of machine-readable, multilingual, structured data collected to support Wikimedia projects and CC0 licensed under the theory that simple statements of fact are not subject to copyrightWikidata items are comprised of statements that have properties and values. In the Linked Open Data world these items are graphs with statements expressed in triples. As Wikimedians and Wikidata editors add more of this supporting structured data to Wikipedia, the idea of adding bibliographic metadata to Wikidata started coming up. Essentially – “Here are some great structured data that are incredibly important to the functionality of Wikipedia; how can we add them to this repository that we’re creating in a useable way?” As many librarians (and really anyone that’s written a substantial research paper) are aware, citations are complicated.   Continue reading

Getting to Know the BHL Users

For my NDSR project at Smithsonian Libraries, I’ll be gathering feedback from the users of BHL to help inform the next version of the digital library. I’ve had the opportunity to meet with several partners of BHL and sit in on BHL Member, Collection, and Tech Team meetings. Through these interactions, I’ve been able to identify three main groups to solicit feedback from for my research: (1) Consortium Users; (2) System Users; and (3) Individual Users.

1. Consortium Users:  A contributor to BHL including Members, Affiliates, Partners, staff, and volunteers

2. System Users:  Organizations or individuals who interact with BHL for the purpose of enriching another system via APIs (Application Programming Interface) or manually

3. Individual Users:  Anyone visiting the BHL website to search for information to answer their research needs such as, scientists, collection managers, librarians, etc.

Consortium Users

As a consortium of natural history and botanical libraries BHL is made up of Members and Affiliates. Each of these consortium users are committed to the mission, vision, and key values of BHL centered around free and open access to biodiversity literature.

Example of digitized content found in BHL. Catesby, Mark. The Natural History of Carolina, Florida, and the Bahama Islands. v. 1, ed. 1. pl. 16. Digitized by Smithsonian Libraries.

Staff at these partner institutions participate in various ways including scanning their biodiversity resources to be added to the BHL content and taking part in BHL working groups and committees as needed. As I’ve sat in on the meetings for some of these committees, I’ve been able to learn more about the BHL Members and Affiliates and their needs. The majority of these members are museums or libraries serving their own users as well. Members are looking for ways to streamline the process for digitizing their materials into BHL and promoting and accessing their content through BHL.

Another important group are volunteers including volunteers through Member institutions or with BHL as a whole. They assist with scanning and uploading content, managing social media, tagging illustrations with taxon names, and participating in crowdsourced transcription efforts, among other activities. These endeavors increase the visibility of and enhance the content in BHL. Check out a couple of BHL’s most active volunteers on social media – Siobhan Leachman and Michelle Marshall and her Historical SciArt.

Continue reading