Notes from the Service Providers Workshop

September 10, 2009

Media Vault Program gathers providers of access, preservation, and digital curation services

Who’s protecting our data? Can institutions such as UC Berkeley ensure “the cumulative record of the past and the well-tended, authentic, and readily accessible data of the present” on which scholarship is built?1 What is at risk if we do not?

On Thursday, September 3, six organizations working at the heart of the preservation, access and digital curation issues that face university scholars agreed to coordinate efforts to develop ways to keep research data safe and easy to share.

At the half-day meeting, directors and staff representing the UC Berkeley library, the California Digital Library (CDL), CDL’s eScholarship program and the Berkeley campus’s Educational Technology Services (ETS), Informatics group and Media Vault Program convened to explore strategies for weaving their offerings into a rich fabric of support for campus researchers, instructors and students.

Scope of the problem

Drawing upon the Media Vault Program’s first phase of research – and its promising pilot service that provides access and back-up to nearly a dozen groups with holdings of more than 500,000 objects, Michael Ashley, the program’s Digital Conservation Architect, outlined the scope of the campus’s needs:

  • The problem is large, but finding solutions is essential
  • Some needs are basic
  • Some needs are complex
  • Common solutions are possible
  • There must be incentives
  • WE are the platform.

“We’ve heard a common thread of feedback,” said Patrick McGrath, Associate Director of Data Repository Management for the campus’s Information Services and Technology (IST) division.  “People have needs, but there’s too wide a range of (disjointed) options for them to make sense of.  We want to be able to point people in the direction of help.” With phase one under its belt, the MVP sees success coming through the concerted efforts of a network of providers, experts and researchers on campus, across the UC system, and in other domains.

Overview of services and roadmaps

The morning began with presentations from each of the service providers.

Chris Hoffman, Manager of Informatics for IST, described the breadth of the reorganized service, which supports the Berkeley Natural History Museums, individual museums, grant partners (including consortia of higher education institutions) and individual faculty.  Central to these efforts is the development of CollectionSpace, a Mellon Foundation-funded effort to create an open framework for collections management.  CollectionSpace, for which UC Berkeley’s Data Services department is designing and developing the underlying services, plans to release its initial product in May of next year.

Mara Hancock, Director of ETS, presented an overview of the unit’s programs and services.  Speaking of ETS’s mission to develop, promote and support the effective integration of collaboration, learning and communication technologies for the campus community and beyond, Hancock noted, “It drives us every day.”  She closed with a progress report on the development of Sakai 3, the next release of the application that powers bSpace, and on the Opencast Matterhorn project – an open-source platform that will support the scheduling, capture, encoding and delivery of educational video and audio content.  Sakai 3, managed by a consortium of higher education institutions including UC Berkeley, should hit campus in about two years; Matterhorn, currently in development by an international team led by ETS, is expected to be up and running by next summer.  Reflecting on the volume of data produced through the use of bSpace and in the course of webcasting, Hancock posed the questions, “How do we manage that mass of data?  How do we help faculty manage the environment?“

Noah Wittman, Manager of the Media Vault Program, pointed to the MV DAM (digital asset management), MV Archive, MV Publish and MV Consult services that have grown from the first-generation offering and presented a roadmap towards a future platform that ties together and builds upon the services offered by those in the room.

Bernie Hurley, Director of Library Technologies and Preservation and head of the Library Systems Office, discussed the Library’s WebGenDL asset management and archive service, and highlighted the benefits it could provide MVP participants in need of digital asset management, integration with the CDL preservation services, support for persistent identifiers, subject specialists who can provide contact with researchers and access to legal counsel on matters of intellectual property.

Catherine Mitchell, Director of the eScholarship Publishing Program, focused on her program’s newly expanded and re-envisioned open-access publishing infrastructure for the University of California.  Mitchell described eScholarship’s new identity as a place to publish (no longer simply a place to put things) and spoke of its new venture, the UC Publishing Service (UCPubS), in conjunction with UC Press.  She also demonstrated a few of the redesigned tools, such as the KWIC Pics PDF-generator, available to authors, researchers and librarians system-wide.

Stephen Abrams, Senior Manager for Digital Preservation Technology at the CDL, closed the round of presentations by introducing the CDL’s new set of micro-services for digital curation.  Micro-services – enabling tasks such as replication, cataloging, transformation and annotation – represent a move from “preservation as a place,” Abrams said, to preservation “as a set of policies and practices focused on maintaining and adding value to trusted digital content.”  “Not all content needs to come to us,” Abrams added.  “We want to push out services to where content lives most naturally.”  The first of these micro-services, supporting identity and storage of digital objects, will be available in January 2010.

Brainstorming at the whiteboard

Comments by Eric Kansa, Executive Director of the Information & Service Design program at the School of Information, provided a frame for the morning’s presentations.  “Topic one,” Kansa stressed, “is ‘How do we make a business case?’”  “What are the ongoing losses?” he asked.  “What risks are we placing ourselves under by not addressing these issues?”

A round of conversation ensued, revolving around questions such as, “What does it mean to be “all together?”  “When do we act individually?  When do we act together?” “How do we step forward on our own in this current budget environment?”  “What’s the killer app?”  At the end, conversation focused back on “What can we do together?”  In anticipation of the upcoming Media Vault small community meeting the following week, and the larger community meeting to be held at the end of October, a new question formed: “What do we present to our communities?”


The group considered a pledge to “partner in the Media Vault Program to help make research data safe and easy to share.”  Questions of “brand,” of balancing group and individual initiatives, and of working together effectively led to a series of agreements among participants:

  • The participants agreed to look for ways to communicate their diverse offerings to the campus community in a coherent way.
  • The Library offered use of WebGenDL, its content management service that catalogs collections of research data and sends the data to CDL for preservation, to the MVP.  The Library very generously volunteered to provide 16TB of storage to the program, at no fee.  Within limits, of course.
  • The CDL expressed its interest in providing its micro-services to the campus, and receiving feedback on them from users.
  • The group agreed to meet regularly to share plans and explore synergistic activities (recognizing that everyone’s time is stretched thin).

First initiatives and other next steps

The group also listed and prioritized a set of initiatives to take on jointly, as a starting point for addressing scholars’ needs and as a way to see how group members can collaborate effectively.  Projects receiving the most votes were:

    1. Easy uploader: a means of getting assets from the desktop to cloud-based or other shared storage.
    2. Secure, accountable storage: provides an inventory of one’s content  and single- and bulk-asset recovery.
    3. Data citation service: provides permanent identifiers through a registration authority for research datasets (under the DataCite initiative) that allows the sets to be linked to publications.
    4. Publishing/access project:  nice ways to present materials via the Web, and tools that facilitate use of stored and shared materials in presentations, etc.  Access is important!  Social tagging fits nicely here.
    5. Use case analysis: refinement of the various use cases identified by the different groups.  This could include the question of how to get services to users “where they are,” and the question of incentives – analysis we will have to continue in any case.
    6. Joint-fundraising, and a focus on supportive funders: identifying grantors and programs that a) support preservation and access initiatives and b) favor research proposals that include a strong commitment to preservation, access and digital curation.

A draft of each of these project definitions will be put on the Media Vault Program wiki.  Workshop participants, especially those whose comments helped shape the projects during the discussion, will refine and flesh out the summaries.

The group agreed to meet again soon to build out the collective vision, to define the high-level requirements for moving forward and to cement the common understanding that has taken form.  Proposed time for this next meeting: mid-October.

[1] Abby Smith, “Academic Amnesia: Who is Preserving Our Data?” Center for Studies in Higher Education, UC Berkeley, November 28, 2006,