The Dataverse - Data Commons Project

The Dataverse Project is a major, international, open-source software and standards project that represents the state of the art in an open source, generalist data repository explicitly designed for research data. It has about 100 installations around the world including the most prominent and well known Harvard Dataverse Repository and an extensive international community of users and code contributors. Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. To date, the Dataverse network counts more than 380K datasets.

Google Data Commons, is a platform and open source technology that aims at making publicly available data more accessible and useful through semantic linkage and visualization tools.

With this initiative, the Dataverse Project aims at leveraging Google’s technology to empower data exploration and data augmentation of existing and future Harvard Dataverse’s data.

The Dataverse - Data Commons Project will provide access to the hundreds of thousands of variables stored in most of the 380K Dataverse research datasets in a new way. These variables will be semantically linked, in time and space, across multiple datasets within Harvard Dataverse repository, and with the open data already available in Google Data Commons.

In this project, no data will be moved outside the control of Dataverse servers, following data access rules chosen by data creators and providers. The Google technology we use will be running locally on our servers. The semantic linkage will also be brought back to Harvard Dataverse for further improvement of the usability of the repository.