Media Vault Program Holds Community Workshop

September 22, 2009

The Media Vault Program brought together users of its first generation services Friday afternoon, September 11, to share updates, gather feedback on functional requirements for a “generation 2” service and plan for a larger community workshop.  Following closely on the heels of last week’s workshop of access, preservation and digital curation service providers, Friday’s meeting furthered MVP’s push to provide campus with tools to keep research data safe and easy to share.

Media Vault users heard about a number of preservation, access and digital curation services currently available to campus or soon to come on line.  Bernie Hurley of the Library Systems Office spoke about the Library’s WebGenDL digital asset management service. “The Library has been using this system to manage its digital assets for about five years,” Hurley said.  In addition to helping researchers catalogue their data and manage the related metadata, the Library can also help Media Vault users and program staff:

  • Create persistent identifiers for their materials
  • Integrate with the California Digital Library’s Digital Preservation Repository
  • Surface collections for discovery
  • Make contact with other researchers, and
  • Starting in November, access legal counsel regarding intellectual property issues.

Hurley repeated the Library’s generous offer to make 16TB of storage available to the Media Vault Program.

John Kunze of the California Digital Library’s Digital Preservation department followed Hurley.  Kunze articulated CDL’s vision to be “recognized as the hub of digital preservation and curation activities for University of California.”   He described the types of materials that the CDL handles, including its web site archive and tools for web site harvesting.  Then, he discussed the CDL’s new digital curation initiative.  “Preservation in not a place,” Kunze said.  “It comes to the user.”  Rather than relying on “monolithic, single-culture systems” to maintain digital objects, the CDL is developing a set of independent but interoperable “micro-services” to handle all aspects of curation, which can be applied throughout the object’s lifecycle.  The first of these, to be available starting in January 2010, will pertain to identity and storage.

Noah Wittman presented on the Media Vault Service’s “Gen2” platform selection process and roadmap.  Building upon its experience with Extensis Portfolio and NetPublish, and keeping a keen eye on the entire ecosystem of access, preservation and curation services available to the campus community, the Media Vault Service is working to develop a recommendation for its future platform within the next six weeks.  The new platform should take advantage of existing services and address the gaps where existing services don’t fill user needs.

Following the round of presentations from service providers, the workshop participants turned their attention to assessing a list of functional requirements for a new platform.  (See the Functional Requirements page of the Media Vault wiki.)  This exercise provided an opportunity for community members to share experiences in a group setting, and for program staff to benefit from the collective expertise of Media Vault users.

The remainder of the workshop focused on planning for the larger community workshop scheduled for the end of October.   Conversations revolved around how to attract campus members to that event, especially given the increased stress and workload caused by the budget crisis.  More generally, how can the Media Vault Program motivate campus scholars to try its services?  Finally, what would it look like if MVP could ramp up its service from 2% of campus to 15%?


Notes from the Service Providers Workshop

September 10, 2009

Media Vault Program gathers providers of access, preservation, and digital curation services

Who’s protecting our data? Can institutions such as UC Berkeley ensure “the cumulative record of the past and the well-tended, authentic, and readily accessible data of the present” on which scholarship is built?1 What is at risk if we do not?

On Thursday, September 3, six organizations working at the heart of the preservation, access and digital curation issues that face university scholars agreed to coordinate efforts to develop ways to keep research data safe and easy to share.

At the half-day meeting, directors and staff representing the UC Berkeley library, the California Digital Library (CDL), CDL’s eScholarship program and the Berkeley campus’s Educational Technology Services (ETS), Informatics group and Media Vault Program convened to explore strategies for weaving their offerings into a rich fabric of support for campus researchers, instructors and students.

Scope of the problem

Drawing upon the Media Vault Program’s first phase of research – and its promising pilot service that provides access and back-up to nearly a dozen groups with holdings of more than 500,000 objects, Michael Ashley, the program’s Digital Conservation Architect, outlined the scope of the campus’s needs:

  • The problem is large, but finding solutions is essential
  • Some needs are basic
  • Some needs are complex
  • Common solutions are possible
  • There must be incentives
  • WE are the platform.

“We’ve heard a common thread of feedback,” said Patrick McGrath, Associate Director of Data Repository Management for the campus’s Information Services and Technology (IST) division.  “People have needs, but there’s too wide a range of (disjointed) options for them to make sense of.  We want to be able to point people in the direction of help.” With phase one under its belt, the MVP sees success coming through the concerted efforts of a network of providers, experts and researchers on campus, across the UC system, and in other domains.

Overview of services and roadmaps

The morning began with presentations from each of the service providers.

Chris Hoffman, Manager of Informatics for IST, described the breadth of the reorganized service, which supports the Berkeley Natural History Museums, individual museums, grant partners (including consortia of higher education institutions) and individual faculty.  Central to these efforts is the development of CollectionSpace, a Mellon Foundation-funded effort to create an open framework for collections management.  CollectionSpace, for which UC Berkeley’s Data Services department is designing and developing the underlying services, plans to release its initial product in May of next year.

Mara Hancock, Director of ETS, presented an overview of the unit’s programs and services.  Speaking of ETS’s mission to develop, promote and support the effective integration of collaboration, learning and communication technologies for the campus community and beyond, Hancock noted, “It drives us every day.”  She closed with a progress report on the development of Sakai 3, the next release of the application that powers bSpace, and on the Opencast Matterhorn project – an open-source platform that will support the scheduling, capture, encoding and delivery of educational video and audio content.  Sakai 3, managed by a consortium of higher education institutions including UC Berkeley, should hit campus in about two years; Matterhorn, currently in development by an international team led by ETS, is expected to be up and running by next summer.  Reflecting on the volume of data produced through the use of bSpace and in the course of webcasting, Hancock posed the questions, “How do we manage that mass of data?  How do we help faculty manage the environment?“

Noah Wittman, Manager of the Media Vault Program, pointed to the MV DAM (digital asset management), MV Archive, MV Publish and MV Consult services that have grown from the first-generation offering and presented a roadmap towards a future platform that ties together and builds upon the services offered by those in the room.

Bernie Hurley, Director of Library Technologies and Preservation and head of the Library Systems Office, discussed the Library’s WebGenDL asset management and archive service, and highlighted the benefits it could provide MVP participants in need of digital asset management, integration with the CDL preservation services, support for persistent identifiers, subject specialists who can provide contact with researchers and access to legal counsel on matters of intellectual property.

Catherine Mitchell, Director of the eScholarship Publishing Program, focused on her program’s newly expanded and re-envisioned open-access publishing infrastructure for the University of California.  Mitchell described eScholarship’s new identity as a place to publish (no longer simply a place to put things) and spoke of its new venture, the UC Publishing Service (UCPubS), in conjunction with UC Press.  She also demonstrated a few of the redesigned tools, such as the KWIC Pics PDF-generator, available to authors, researchers and librarians system-wide.

Stephen Abrams, Senior Manager for Digital Preservation Technology at the CDL, closed the round of presentations by introducing the CDL’s new set of micro-services for digital curation.  Micro-services – enabling tasks such as replication, cataloging, transformation and annotation – represent a move from “preservation as a place,” Abrams said, to preservation “as a set of policies and practices focused on maintaining and adding value to trusted digital content.”  “Not all content needs to come to us,” Abrams added.  “We want to push out services to where content lives most naturally.”  The first of these micro-services, supporting identity and storage of digital objects, will be available in January 2010.

Brainstorming at the whiteboard

Comments by Eric Kansa, Executive Director of the Information & Service Design program at the School of Information, provided a frame for the morning’s presentations.  “Topic one,” Kansa stressed, “is ‘How do we make a business case?’”  “What are the ongoing losses?” he asked.  “What risks are we placing ourselves under by not addressing these issues?”

A round of conversation ensued, revolving around questions such as, “What does it mean to be “all together?”  “When do we act individually?  When do we act together?” “How do we step forward on our own in this current budget environment?”  “What’s the killer app?”  At the end, conversation focused back on “What can we do together?”  In anticipation of the upcoming Media Vault small community meeting the following week, and the larger community meeting to be held at the end of October, a new question formed: “What do we present to our communities?”


The group considered a pledge to “partner in the Media Vault Program to help make research data safe and easy to share.”  Questions of “brand,” of balancing group and individual initiatives, and of working together effectively led to a series of agreements among participants:

  • The participants agreed to look for ways to communicate their diverse offerings to the campus community in a coherent way.
  • The Library offered use of WebGenDL, its content management service that catalogs collections of research data and sends the data to CDL for preservation, to the MVP.  The Library very generously volunteered to provide 16TB of storage to the program, at no fee.  Within limits, of course.
  • The CDL expressed its interest in providing its micro-services to the campus, and receiving feedback on them from users.
  • The group agreed to meet regularly to share plans and explore synergistic activities (recognizing that everyone’s time is stretched thin).

First initiatives and other next steps

The group also listed and prioritized a set of initiatives to take on jointly, as a starting point for addressing scholars’ needs and as a way to see how group members can collaborate effectively.  Projects receiving the most votes were:

    1. Easy uploader: a means of getting assets from the desktop to cloud-based or other shared storage.
    2. Secure, accountable storage: provides an inventory of one’s content  and single- and bulk-asset recovery.
    3. Data citation service: provides permanent identifiers through a registration authority for research datasets (under the DataCite initiative) that allows the sets to be linked to publications.
    4. Publishing/access project:  nice ways to present materials via the Web, and tools that facilitate use of stored and shared materials in presentations, etc.  Access is important!  Social tagging fits nicely here.
    5. Use case analysis: refinement of the various use cases identified by the different groups.  This could include the question of how to get services to users “where they are,” and the question of incentives – analysis we will have to continue in any case.
    6. Joint-fundraising, and a focus on supportive funders: identifying grantors and programs that a) support preservation and access initiatives and b) favor research proposals that include a strong commitment to preservation, access and digital curation.

A draft of each of these project definitions will be put on the Media Vault Program wiki.  Workshop participants, especially those whose comments helped shape the projects during the discussion, will refine and flesh out the summaries.

The group agreed to meet again soon to build out the collective vision, to define the high-level requirements for moving forward and to cement the common understanding that has taken form.  Proposed time for this next meeting: mid-October.

[1] Abby Smith, “Academic Amnesia: Who is Preserving Our Data?” Center for Studies in Higher Education, UC Berkeley, November 28, 2006,

MVP Interim Report

September 10, 2009

We are pleased to share our latest findings from our Interim Report.  Below is the Executive Summary. To read the report in its entirety, please visit the wiki or download the pdf here.

Executive Summary of Findings

“Scholarship is built on the cumulative record of the past and the well-tended, authentic, and readily accessible data of the present. Current federal efforts to build a digital information preservation infrastructure at the Library of Congress and the National Archives assume that research institutions responsible for producing large quantities of research data, such as the University of California, will take responsibility for ensuring its long-term access. Is that a reasonable expectation? What is at risk if they do not?”

Abby Smith, “Academic Amnesia: Who is Preserving Our Data?” – Center for Studies in Higher Education, UC Berkeley, November 28, 2006 –

Executive Summary
The principal finding of the Media Vault Program is that it is essential to have services that make research data safe and easy to share for our campus. What was true in 2006 (when we began the Media Vault) remains true today, although the texture of the challenge is now understood at a much finer grain. Our findings show that obstacles to the development, adoption and sustainability of services can be described in economic, technical, political/organizational and social terms, as corroborated by the excellent work from several leading reports, including:

Use and Users of Digital Resources: A Focus on Undergraduate Education in the Humanities and Social Sciences – Harley et al.

Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation – Berman et al. [BRTF]

Sustaining Digital Resources: An On-the-Ground View of Projects Today: Ithaka Case Studies in Sustainability – Maron et al.

A Multi-Dimensional Framework for Academic Support: A Final Report – Lougee et al.

Scholarly Communication: Academic Values and Sustainable Models – King et al.

A Report on the Range of Policies Required For and Related To Digital Curation – Jones

Before delving into the obstacles, let’s take a look at several findings that make way to an opportune moment to launch a campus-wide program like the Media Vault:

  • • The problem is large, but solutions are essentially needed – Our findings indicate that we need to own the problem coherently. We need to work together (the service providers and technical experts) and harmonize efforts to the greatest degree possible.
  • • The problem is manageable – It is possible to make progress incrementally. There are pragmatic, and relatively inexpensive measures that we can put in place, which will provide excellent benefits. See functions and requirements below.
  • • Some needs are basic – A safe place to put things, an easy way to share things. The principal need for most users is a safe place to put their research data, and the peace of mind this brings. Easy access to primary content is an essential requirement.
  • • Some needs are complex – Long-term digital preservation and permanent access is tricky. The shift of responsibilities from creator to curator brings with it incredible complexities due to the requirements that are typically introduced in order to affirm this transition. We need to be patient and accommodating with our user community and realize the complexities of this domain are impediments to adoption.
  • • There are few incentives to do the right thing – We need to encourage good thinking, best practices. – “In many environments, there are few incentives to develop the persistent collaborations and uniform approaches needed to support access and preservation efforts over the long-term.” – Incentives need not be financial, they can be convenience, competitive, ease-of-use, novel.
  • • There is a desire to learn and share – Participants are engaged, interested, willing to learn. One of the key strengths of working in an academic environment is the general desire to try things, experiment, and a tolerance to imperfection.
  • • WE are the platform – As much as technical services, consulting and problem solving are desperately needed, and go a long way. Our participants are innovative and motivated. The Media Vault Program is potentially a remarkable resource of support for the research endeavor.
  • • Media Vault is a good brand – Especially if co-owned and operated by our selected partners. For some of us, the brand may seem too constraining, limited to media – data supporting the research endeavor. Our findings indicate that the majority of the research enterprise is dependent on binary files, defined in the simple terms of Office documents, PDF, images, and video. If we can make progress on making these types of media safe and easy to share, we will have made significant gains.
  • • Common solutions are possible – By focusing on workflow and lifecycle, common pain points are revealed for most users – collections, researchers, departments. There are individual researchers with 10’s or 1000’s of images, and departments with the need to share fewer files but broadly. Scale is relative.
  • • We need enterprise solutions in order to support an enterprise like Berkeley – We need services that scale. We cannot and need not own every service, but we need to own the service catalog. We need to give position ourselves to make recommendations, have opinions, make assertions, and be helpful.
  • • Full service to self-service – Different users have different needs, abilities to pay/contribute. There is not a sliding scale between the haves who can afford the full services and the haves not who cannot. In fact, self-service, meaning self-empowerment, should be a goal. As much as possible, the research enterprise should be both self-reliant and fully supported. Self service is a key to human scalability issues for the suppliers, which translates to lower costs and greater responsiveness.
  • Obstacles
  • All major studies and reports on the sustainability of digital resources point to a multitude of barriers that can be clustered into four factors:

    Economic: Who owns the problem, and who benefits from the solutions? Who pays for the services, long-term preservation, development, and curation? From the [BRTF]: While there is “general agreement that digital information is fundamental to the conduct of modern research, education, business, commerce, and government,” there is “no general agreement, however, about who is responsible and who should pay for the access to, and preservation of, valuable present and future digital information.”

    Technical: Simple services are needed, but they are not simple to build, implement, integrate and support in our complex environment. Successful structures that can support digital scholarship must account for user needs, emerging technologies/file formats, adverse working contexts (fieldwork, offline, multi-platform), and should be supported at the enterprise scale. Commercial/proprietary offerings can provide a lot of functionality out of the box, but with potentially high licensing costs. Open source solutions are prevalent and freely available, but often require significant financial, development and support investment.

    Political/Organizational:  We think the Media Vault Program community approach to making research data safe and easy to share puts a spotlight on both the urgency of the problem, and the challenges that must be overcome structurally in order to make progress on solutions. For example, there are good reasons for the various service provider organizations to innovate on their own, but there is much to gain from working together on common goals and milestones. In fact, where communities have succeeded in softening the boundaries between content producers and consumers, supporters and beneficiaries, significant successes have been achieved. Conversely, where misalignment around roles, goals and responsibilities persist, so do the barriers to sustainable stewardship.

    Social:  We live in interesting times, where disruptive technologies such as Facebook and Google are transforming how we communicate culturally, and the prevalence of cheap/stolen media has produced an expectation that things should be always available, conveniently packaged, and free. Where some organizations, such as the Long Now Foundation, are hoping to “provide counterpoint to today’s “faster/cheaper” mind set and promote “slower/better” thinking,” it may be up to those of us who care deeply about the persistence of research data to step up as the seas continue to change.

    Sometimes simple is good enough, as is evidenced by many technologies that have solved complex problems adequately. MP3, RSS, PGP, Skype, Twitter, tinyURL, WordPress blogs and gmail. What all of these technologies have in common is that their developers took on a problem and tried to solve an essential part that would have maximal benefits for most, but not all users. If we can devise solutions that will help 80% of our research community, will that be a reasonable and desirable outcome? Will it be a good enough start?

  • Next Steps: Where Do We Go From Here?
  • The Media Vault Program represents an opportunity to overcome the barriers to development, adoption and sustainability of services through its community-driven approach. Our community understands the urgency of the problem and faces the challenges posed by these barriers in their everyday work. Furthermore, we foresee that “access to data tomorrow requires decisions concerning preservation today.” Our campus needs a thriving, well-governed, effective program to address what is recognized as one of the “most urgent” and essential problems facing research organizations today.

    We believe that in order to make major progress for the community we need three things:

    1. Program: A supported, sustainable community of participants, providers and sponsors.
    2. Platform: A next-generation Media Vault platform that is enterprise strength in terms of reliability and scalability.
    3. Pledge: A statement of support from the campus executive.


    MVP Spotlight- September 2009

    September 1, 2009

    Each month, we highlight news relating to digital scholarship, access and preservation at Berkeley and around the world. To contribute, email Lizzy.

    On Campus
    5 Major Research Universities Endorse Open-Access Journals
    By Ben Terris
    UC Berkeley, along with Cornell University, Dartmouth College, Harvard University, and MIT, ‘signed a compact agreeing to the “timely establishment” of mechanisms for providing financial support for free open-access journals.’ This is in response to the high costs of purchasing journals, as well as the growing Open Access movement.

    CollectionSpace .02 Release
    The CollectionSpace team is expected to release .02 at the end of the month. The new release will have a slightly different design, as well as “four new user screens….:login, create new landing page, find and edit landing page, and intake. ” The CollectionSpace team is made up of a variety of institutions, including UC Berkeley, “with the common goal of providing a platform for a collections management system.”

    New Batch Download Feature in ARTstor
    History of Art Visual Resource Center (HAVRC) recently created a 4 minute tutorial demonstrating Artstor’s latest feature: Batch Download. Users are also able to batch download items straight into PowerPoint. Currently, ArtStor is limiting the number of files downloaded. Users are only able to download a 1000 images per semester.

    Around the world
    The Sixth International Conference on Preservation of Digital Objects
    October 5-6, 2009
    Mission Bay Conference Center, San Francisco, CA
    The California Digital Library (CDL) will be hosting the sixth International Conference on Preservation of Digital Objects (iPres). This conference will be held in San Franciso on October 5-6, 2009. This conference will “bring together researchers and practitioners from around the world to explore the latest trends, innovations, and practices in preserving our scientific and cultural digital heritage,” as well as “continue the discussion of creating our digital future.”

    Sun PASIG Fall Meeting
    San Francisco, CA
    October 7-9, 2009
    “Sun Preservation and Archiving Special Interest Group (PASIG) will be hosting a 2 day conference in October. The conference will focus on a variety of topics, ranging from storage technology, repositories, to sustainability. Presenters and current attendees come from institutions from all over the world. Co-sponsored by Stanford, Sun PASIG “is focused on sharing open computing solutions and best practices.”

    Data Sharing
    This week’s Nature features a special section devoted to data sharing. Topics include researchers hesitation to share, pre and post data sharing, as well as the importance of preserving and sharing data.

    Library of Congress and DuraCloud Launch Pilot Program Using Cloud Technologies
    The Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) and DuraSpace are collaborating on a one-year pilot program. The pilot program will “test the use of cloud technologies to enable perpetual access to…digital content.” T Recently developed by DuraSpace, DuraCloud is the new cloud-based service that will be tested. Other partners include the New York Public Library and the Biodiversity Heritage Library.

    Sun in Education Web Seminar Series
    “All About Repositories” series
    Part of Sun’s “Technology that Bridges the Digital Divide” seminars, the “All About Repositories” series will begin in September. Along with Sun, DuraSpace and SPARC International will “provide overviews of best practices, technology updates, and key trend analyses for academic resources directors, IT managers, digital librarians, repository managers and developers, and curators.”

    UNESCO Digital Library Majaliss opens up classical Arabic literature to public
    UNESCO recently launched the Digital Library Majaliss project, which aims to ‘provide free access to hundreds of thousands of pages of classical Arabic literature and to demonstrate, at the same time, the innovative use of information and communication technologies (ICT) for reading, teaching and learning.’ The project is accessible online and on CD-Roms.