Media Vault Program Update

October 22, 2010

Media Vault Program Update
October 2010

Greetings MVP Stakeholders,

This month brings the completion of the Media Vault Program, as well as the beginning of new initiatives. It also brings a brief extension, through winter break, for the Extensis Portfolio/NetPublish service.

The three-year, grant-funded Media Vault Program comes to a close this fall.  Supported by UC Berkeley’s IT Bank, the MVP has been instrumental in raising issues related to digital asset management and preservation.  Its efforts have addressed fundamental scholarly needs on campus.

The Media Vault Program brought several new services to campus.  Working closely with research and teaching collections, the MVP put in place an innovative digital asset management and archive solution, coupling the notion of long-term preservation with commercially available cataloging and publishing software.  This exploratory offering helped campus collections such as the Phoebe A. Hearst Museum of Anthropology and the History of Art Visual Resources Collection manage their images, recordings and other electronic holdings, and make these objects available on the Web.  Over the last year, the MVP team took the lessons learned from this work and applied them to the creation of a sustainable content management and collaboration service appropriate for use by the entire campus.  (See ‘Media Hub becoming ‘Research Hub’’, below.)

Equally as important, the MVP created a unique partnership among campus and UC system-wide programs, bringing together the UC Berkeley Library, the campus’s Educational Technology Services and Information Services & Technology organizations, the California Digital Library and others in discussions and workshops about this vital area of academic technology.  Program partners have spawned new initiatives such as the forth-coming Research Hub service, the California Digital Library’s UC Curation Center (see ‘New Services for the UC Community from the CDL/UC3’, below) and significant updates by the campus Library to its GenDB service.  Through these and other initiatives, MVP partners continue to provide help to scholars, even as funding for the Media Vault Program comes to an end.

Increasingly, digital materials form the heart of scholarship.  For the past three years, the Media Vault Program has focused its attention – and the University’s – on the proper management of these resources.  It is a complex and expensive endeavor; developing the means to support scholars will take the concerted efforts of many parties.  The vision of the MVP, and the spirit of collaboration that guided it, must live on!

The Media Hub, the content management and collaboration service piloted this spring by the Media Vault team, has gained a new name on its way to launch.  Rechristened Research Hub, this new service from Information Services & Technology’s Data Services Department is designed to support the needs of campus researchers.  Its URL will be easy to remember:

The Research Hub team is developing terms of service and pricing models in preparation for a limited release this fall.  Look for an announcement soon.

While the final decisions are being made, the underlying hardware and software have been installed, localized, tested and prepared for campus use.  Authentication has been tied to the campus’s CalNet identity management service, so users won’t need a separate password.  Research Hub is in the queue of services awaiting integration with the CalNet Guest Access program; by the end of the semester, partners and colleagues from outside the campus community should be able to work collaboratively with campus scholars, students and staff.

Meanwhile, the Research Hub has been selected as one of the workspace engines behind the Project Bamboo technology proposal.  Over the next 18 months, the Research Hub will be used to prototype workspace features and to test the online workspace requirements of scholars in the Humanities, Arts and interpretive Social Sciences.

The Media Vault Program’s Extensis Portfolio/NetPublish-based service, originally slated to close in October of 2010, has been extended through the end of January 2011, giving MVP pilot participants time to move their collections and catalogs to other platforms.  Thanks to closeout funding from the campus’s IT Bank and the continued generosity of the Library, collection owners will now have until the beginning of spring semester to retrieve their materials.

The Media Vault team will contact participants in the upcoming weeks to help define migration plans.  If you have questions in the meantime, please contact the Media Vault team by email at

The California Digital Library (CDL) has announced two new services for the UC community.  Merritt, the next generation repository service from CDL’s UC Curation Center (UC3) will allow UC3 to extend the reach of its services to new constituencies such as museums, archives, research groups, academic departments and data centers.

Significant features include:
• permanent storage
• access via persistent URLs
• tools for long-term management
• an easy-to-use interface for deposit and updates

EZID (ee-zee-eye-dee) enables persistent identification of and access to a scholar’s research, which is critical to the long-term distribution and availability of the work.  Currently, EZID allows users to acquire DataCite Digital Object Identifiers (DOIs) or Archival Resource Keys (ARKs).  CDL plans to add other identifier schemes going forward.  EZID is available via a machine-to-machine programming interface (an API) and as a web user interface.

The Research Hub team is already working with the UC3 team to automate the transfer of content from UC Berkeley to the UC3 Merritt platform.  For more information about Merritt or EZID, please contact UC3 (see contact link, below).

So, the program ends, but the effort continues. It has been a pleasure working with each of you – and will continue to be, in new and different forms of endeavor.
The Media Vault team

Useful Links:
Research Hub (UC Berkeley): (coming)
About: (coming)
Contact: (coming)
Project Bamboo:

Media Vault Team email address:

California Digital Library / University of California Curation Center (UC3):
UC3 contact page:
Merritt webinar:


CollectionSpace Project Webinars

October 20, 2009

CollectionSpace, a open-source application to support Museums and collections management, will hosting se series of webinars in the next couple of weeks. The first webinar will be this Thursday, October 22 at 10 am PST. For more information, please go here.

Current Schedule:
CollectionSpace for Technology Service Providers and Developers, Thursday, October 22, at 10am PST.
CollectionSpace for Museum and Academic Technology Professionals, October 29, 2009
CollectionSpace for Museum and Cultural Heritage Professionals, November 5, 2009

CollectionSpace is funded by the Mellon Foundation and is made up of a variety of institution, including the Museum of the Moving image (NYC), UC Berkeley, University of Toronto and the University of Cambridge.

Notes from the Service Providers Workshop

September 10, 2009

Media Vault Program gathers providers of access, preservation, and digital curation services

Who’s protecting our data? Can institutions such as UC Berkeley ensure “the cumulative record of the past and the well-tended, authentic, and readily accessible data of the present” on which scholarship is built?1 What is at risk if we do not?

On Thursday, September 3, six organizations working at the heart of the preservation, access and digital curation issues that face university scholars agreed to coordinate efforts to develop ways to keep research data safe and easy to share.

At the half-day meeting, directors and staff representing the UC Berkeley library, the California Digital Library (CDL), CDL’s eScholarship program and the Berkeley campus’s Educational Technology Services (ETS), Informatics group and Media Vault Program convened to explore strategies for weaving their offerings into a rich fabric of support for campus researchers, instructors and students.

Scope of the problem

Drawing upon the Media Vault Program’s first phase of research – and its promising pilot service that provides access and back-up to nearly a dozen groups with holdings of more than 500,000 objects, Michael Ashley, the program’s Digital Conservation Architect, outlined the scope of the campus’s needs:

  • The problem is large, but finding solutions is essential
  • Some needs are basic
  • Some needs are complex
  • Common solutions are possible
  • There must be incentives
  • WE are the platform.

“We’ve heard a common thread of feedback,” said Patrick McGrath, Associate Director of Data Repository Management for the campus’s Information Services and Technology (IST) division.  “People have needs, but there’s too wide a range of (disjointed) options for them to make sense of.  We want to be able to point people in the direction of help.” With phase one under its belt, the MVP sees success coming through the concerted efforts of a network of providers, experts and researchers on campus, across the UC system, and in other domains.

Overview of services and roadmaps

The morning began with presentations from each of the service providers.

Chris Hoffman, Manager of Informatics for IST, described the breadth of the reorganized service, which supports the Berkeley Natural History Museums, individual museums, grant partners (including consortia of higher education institutions) and individual faculty.  Central to these efforts is the development of CollectionSpace, a Mellon Foundation-funded effort to create an open framework for collections management.  CollectionSpace, for which UC Berkeley’s Data Services department is designing and developing the underlying services, plans to release its initial product in May of next year.

Mara Hancock, Director of ETS, presented an overview of the unit’s programs and services.  Speaking of ETS’s mission to develop, promote and support the effective integration of collaboration, learning and communication technologies for the campus community and beyond, Hancock noted, “It drives us every day.”  She closed with a progress report on the development of Sakai 3, the next release of the application that powers bSpace, and on the Opencast Matterhorn project – an open-source platform that will support the scheduling, capture, encoding and delivery of educational video and audio content.  Sakai 3, managed by a consortium of higher education institutions including UC Berkeley, should hit campus in about two years; Matterhorn, currently in development by an international team led by ETS, is expected to be up and running by next summer.  Reflecting on the volume of data produced through the use of bSpace and in the course of webcasting, Hancock posed the questions, “How do we manage that mass of data?  How do we help faculty manage the environment?“

Noah Wittman, Manager of the Media Vault Program, pointed to the MV DAM (digital asset management), MV Archive, MV Publish and MV Consult services that have grown from the first-generation offering and presented a roadmap towards a future platform that ties together and builds upon the services offered by those in the room.

Bernie Hurley, Director of Library Technologies and Preservation and head of the Library Systems Office, discussed the Library’s WebGenDL asset management and archive service, and highlighted the benefits it could provide MVP participants in need of digital asset management, integration with the CDL preservation services, support for persistent identifiers, subject specialists who can provide contact with researchers and access to legal counsel on matters of intellectual property.

Catherine Mitchell, Director of the eScholarship Publishing Program, focused on her program’s newly expanded and re-envisioned open-access publishing infrastructure for the University of California.  Mitchell described eScholarship’s new identity as a place to publish (no longer simply a place to put things) and spoke of its new venture, the UC Publishing Service (UCPubS), in conjunction with UC Press.  She also demonstrated a few of the redesigned tools, such as the KWIC Pics PDF-generator, available to authors, researchers and librarians system-wide.

Stephen Abrams, Senior Manager for Digital Preservation Technology at the CDL, closed the round of presentations by introducing the CDL’s new set of micro-services for digital curation.  Micro-services – enabling tasks such as replication, cataloging, transformation and annotation – represent a move from “preservation as a place,” Abrams said, to preservation “as a set of policies and practices focused on maintaining and adding value to trusted digital content.”  “Not all content needs to come to us,” Abrams added.  “We want to push out services to where content lives most naturally.”  The first of these micro-services, supporting identity and storage of digital objects, will be available in January 2010.

Brainstorming at the whiteboard

Comments by Eric Kansa, Executive Director of the Information & Service Design program at the School of Information, provided a frame for the morning’s presentations.  “Topic one,” Kansa stressed, “is ‘How do we make a business case?’”  “What are the ongoing losses?” he asked.  “What risks are we placing ourselves under by not addressing these issues?”

A round of conversation ensued, revolving around questions such as, “What does it mean to be “all together?”  “When do we act individually?  When do we act together?” “How do we step forward on our own in this current budget environment?”  “What’s the killer app?”  At the end, conversation focused back on “What can we do together?”  In anticipation of the upcoming Media Vault small community meeting the following week, and the larger community meeting to be held at the end of October, a new question formed: “What do we present to our communities?”


The group considered a pledge to “partner in the Media Vault Program to help make research data safe and easy to share.”  Questions of “brand,” of balancing group and individual initiatives, and of working together effectively led to a series of agreements among participants:

  • The participants agreed to look for ways to communicate their diverse offerings to the campus community in a coherent way.
  • The Library offered use of WebGenDL, its content management service that catalogs collections of research data and sends the data to CDL for preservation, to the MVP.  The Library very generously volunteered to provide 16TB of storage to the program, at no fee.  Within limits, of course.
  • The CDL expressed its interest in providing its micro-services to the campus, and receiving feedback on them from users.
  • The group agreed to meet regularly to share plans and explore synergistic activities (recognizing that everyone’s time is stretched thin).

First initiatives and other next steps

The group also listed and prioritized a set of initiatives to take on jointly, as a starting point for addressing scholars’ needs and as a way to see how group members can collaborate effectively.  Projects receiving the most votes were:

    1. Easy uploader: a means of getting assets from the desktop to cloud-based or other shared storage.
    2. Secure, accountable storage: provides an inventory of one’s content  and single- and bulk-asset recovery.
    3. Data citation service: provides permanent identifiers through a registration authority for research datasets (under the DataCite initiative) that allows the sets to be linked to publications.
    4. Publishing/access project:  nice ways to present materials via the Web, and tools that facilitate use of stored and shared materials in presentations, etc.  Access is important!  Social tagging fits nicely here.
    5. Use case analysis: refinement of the various use cases identified by the different groups.  This could include the question of how to get services to users “where they are,” and the question of incentives – analysis we will have to continue in any case.
    6. Joint-fundraising, and a focus on supportive funders: identifying grantors and programs that a) support preservation and access initiatives and b) favor research proposals that include a strong commitment to preservation, access and digital curation.

A draft of each of these project definitions will be put on the Media Vault Program wiki.  Workshop participants, especially those whose comments helped shape the projects during the discussion, will refine and flesh out the summaries.

The group agreed to meet again soon to build out the collective vision, to define the high-level requirements for moving forward and to cement the common understanding that has taken form.  Proposed time for this next meeting: mid-October.

[1] Abby Smith, “Academic Amnesia: Who is Preserving Our Data?” Center for Studies in Higher Education, UC Berkeley, November 28, 2006,

MVP Interim Report

September 10, 2009

We are pleased to share our latest findings from our Interim Report.  Below is the Executive Summary. To read the report in its entirety, please visit the wiki or download the pdf here.

Executive Summary of Findings

“Scholarship is built on the cumulative record of the past and the well-tended, authentic, and readily accessible data of the present. Current federal efforts to build a digital information preservation infrastructure at the Library of Congress and the National Archives assume that research institutions responsible for producing large quantities of research data, such as the University of California, will take responsibility for ensuring its long-term access. Is that a reasonable expectation? What is at risk if they do not?”

Abby Smith, “Academic Amnesia: Who is Preserving Our Data?” – Center for Studies in Higher Education, UC Berkeley, November 28, 2006 –

Executive Summary
The principal finding of the Media Vault Program is that it is essential to have services that make research data safe and easy to share for our campus. What was true in 2006 (when we began the Media Vault) remains true today, although the texture of the challenge is now understood at a much finer grain. Our findings show that obstacles to the development, adoption and sustainability of services can be described in economic, technical, political/organizational and social terms, as corroborated by the excellent work from several leading reports, including:

Use and Users of Digital Resources: A Focus on Undergraduate Education in the Humanities and Social Sciences – Harley et al.

Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation – Berman et al. [BRTF]

Sustaining Digital Resources: An On-the-Ground View of Projects Today: Ithaka Case Studies in Sustainability – Maron et al.

A Multi-Dimensional Framework for Academic Support: A Final Report – Lougee et al.

Scholarly Communication: Academic Values and Sustainable Models – King et al.

A Report on the Range of Policies Required For and Related To Digital Curation – Jones

Before delving into the obstacles, let’s take a look at several findings that make way to an opportune moment to launch a campus-wide program like the Media Vault:

  • • The problem is large, but solutions are essentially needed – Our findings indicate that we need to own the problem coherently. We need to work together (the service providers and technical experts) and harmonize efforts to the greatest degree possible.
  • • The problem is manageable – It is possible to make progress incrementally. There are pragmatic, and relatively inexpensive measures that we can put in place, which will provide excellent benefits. See functions and requirements below.
  • • Some needs are basic – A safe place to put things, an easy way to share things. The principal need for most users is a safe place to put their research data, and the peace of mind this brings. Easy access to primary content is an essential requirement.
  • • Some needs are complex – Long-term digital preservation and permanent access is tricky. The shift of responsibilities from creator to curator brings with it incredible complexities due to the requirements that are typically introduced in order to affirm this transition. We need to be patient and accommodating with our user community and realize the complexities of this domain are impediments to adoption.
  • • There are few incentives to do the right thing – We need to encourage good thinking, best practices. – “In many environments, there are few incentives to develop the persistent collaborations and uniform approaches needed to support access and preservation efforts over the long-term.” – Incentives need not be financial, they can be convenience, competitive, ease-of-use, novel.
  • • There is a desire to learn and share – Participants are engaged, interested, willing to learn. One of the key strengths of working in an academic environment is the general desire to try things, experiment, and a tolerance to imperfection.
  • • WE are the platform – As much as technical services, consulting and problem solving are desperately needed, and go a long way. Our participants are innovative and motivated. The Media Vault Program is potentially a remarkable resource of support for the research endeavor.
  • • Media Vault is a good brand – Especially if co-owned and operated by our selected partners. For some of us, the brand may seem too constraining, limited to media – data supporting the research endeavor. Our findings indicate that the majority of the research enterprise is dependent on binary files, defined in the simple terms of Office documents, PDF, images, and video. If we can make progress on making these types of media safe and easy to share, we will have made significant gains.
  • • Common solutions are possible – By focusing on workflow and lifecycle, common pain points are revealed for most users – collections, researchers, departments. There are individual researchers with 10’s or 1000’s of images, and departments with the need to share fewer files but broadly. Scale is relative.
  • • We need enterprise solutions in order to support an enterprise like Berkeley – We need services that scale. We cannot and need not own every service, but we need to own the service catalog. We need to give position ourselves to make recommendations, have opinions, make assertions, and be helpful.
  • • Full service to self-service – Different users have different needs, abilities to pay/contribute. There is not a sliding scale between the haves who can afford the full services and the haves not who cannot. In fact, self-service, meaning self-empowerment, should be a goal. As much as possible, the research enterprise should be both self-reliant and fully supported. Self service is a key to human scalability issues for the suppliers, which translates to lower costs and greater responsiveness.
  • Obstacles
  • All major studies and reports on the sustainability of digital resources point to a multitude of barriers that can be clustered into four factors:

    Economic: Who owns the problem, and who benefits from the solutions? Who pays for the services, long-term preservation, development, and curation? From the [BRTF]: While there is “general agreement that digital information is fundamental to the conduct of modern research, education, business, commerce, and government,” there is “no general agreement, however, about who is responsible and who should pay for the access to, and preservation of, valuable present and future digital information.”

    Technical: Simple services are needed, but they are not simple to build, implement, integrate and support in our complex environment. Successful structures that can support digital scholarship must account for user needs, emerging technologies/file formats, adverse working contexts (fieldwork, offline, multi-platform), and should be supported at the enterprise scale. Commercial/proprietary offerings can provide a lot of functionality out of the box, but with potentially high licensing costs. Open source solutions are prevalent and freely available, but often require significant financial, development and support investment.

    Political/Organizational:  We think the Media Vault Program community approach to making research data safe and easy to share puts a spotlight on both the urgency of the problem, and the challenges that must be overcome structurally in order to make progress on solutions. For example, there are good reasons for the various service provider organizations to innovate on their own, but there is much to gain from working together on common goals and milestones. In fact, where communities have succeeded in softening the boundaries between content producers and consumers, supporters and beneficiaries, significant successes have been achieved. Conversely, where misalignment around roles, goals and responsibilities persist, so do the barriers to sustainable stewardship.

    Social:  We live in interesting times, where disruptive technologies such as Facebook and Google are transforming how we communicate culturally, and the prevalence of cheap/stolen media has produced an expectation that things should be always available, conveniently packaged, and free. Where some organizations, such as the Long Now Foundation, are hoping to “provide counterpoint to today’s “faster/cheaper” mind set and promote “slower/better” thinking,” it may be up to those of us who care deeply about the persistence of research data to step up as the seas continue to change.

    Sometimes simple is good enough, as is evidenced by many technologies that have solved complex problems adequately. MP3, RSS, PGP, Skype, Twitter, tinyURL, WordPress blogs and gmail. What all of these technologies have in common is that their developers took on a problem and tried to solve an essential part that would have maximal benefits for most, but not all users. If we can devise solutions that will help 80% of our research community, will that be a reasonable and desirable outcome? Will it be a good enough start?

  • Next Steps: Where Do We Go From Here?
  • The Media Vault Program represents an opportunity to overcome the barriers to development, adoption and sustainability of services through its community-driven approach. Our community understands the urgency of the problem and faces the challenges posed by these barriers in their everyday work. Furthermore, we foresee that “access to data tomorrow requires decisions concerning preservation today.” Our campus needs a thriving, well-governed, effective program to address what is recognized as one of the “most urgent” and essential problems facing research organizations today.

    We believe that in order to make major progress for the community we need three things:

    1. Program: A supported, sustainable community of participants, providers and sponsors.
    2. Platform: A next-generation Media Vault platform that is enterprise strength in terms of reliability and scalability.
    3. Pledge: A statement of support from the campus executive.