The proposed “Institute for the Secure Sharing of Online Data” (ISSOD) is a new initiative that aims to establish an institute to:
(a) act as a data repository for large-scale social and digital media data sets
(b) provide a replication archive for large sensitive scale social and digital media datasets
(c) establish a new "National Information Survey" that provide regular surveys to monitor trends in digital information consumption.
In this whitepaper, I will focus on addressing aims (a) and (b) above. My recommendations and review of the current landscape in these topics are mostly based on my experience leading the Dataverse project (King 2007; Crosas 2011; The Dataverse Project 2018) for more than a decade.
The ISSOD is starting a herculean task aiming to act as a data repository and replication archive for large-scale, sensitive social and digital media data. My main advice is to avoid developing the repository and archive from the ground up. Instead, use existing technologies and establish collaborations that facilitate extending the features of the repository. Then, focus your efforts and resources to acquire, clean, and curate or annotate the data to make them useful research products, as well as to build a data governance body that can help establish policies and make decisions on DUAs, set appropriate access and security levels for the datasets, and review granting access to researchers.
Altman, Micah, Christine Borgman, Mercè Crosas, and Maryann Matone. 2015. “An Introduction to the Joint Principles for Data Citation.” Bulletin of the American Society for Information Science and Technology 41 (3): 43–45. doi:10.1002/bult.2015.1720410313.
CKAN. 2018. “Open Source Data Web Portal.” Comprehensive Knowledge Archive Network. https://ckan.org/.
Crosas, Mercè. 2011. “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data.” D-Lib Magazine 17 (1/2). doi:10.1045/january2011-crosas.
Gaboardi, Marco, James Honaker, Gary King, Jack Murtagh, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. 2016. “PSI (Ψ): A Private Data Sharing Interface.” http://arxiv.org/abs/1609.04340.
Harvard Dataverse. 2018. “A Data Repository for Sharing, Citing, and Archiving.” https://dataverse.harvard.edu/.
King, Gary. 2007. “An Introduction to the Dataverse Network as an Infrastructure for Data Sharing.” Sociological Methods & Research 36 (2): 173–99. doi:10.1177/0049124107306660.
Sotomayor, Borja, and Lisa Childers. 2006. Globus Toolkit 4: Programming Java Services. San Francisco: Morgan Kaufmann.
Sweeney, Latanya, Mercè Crosas, and Michael Bar-Sinai. 2015. “Sharing Sensitive Data with Confidence: The Datatags System.” Technology Science. https://techscience.org/a/2015101601.
The Dataverse Project. 2018. “Open Source Research Data Repository Software.” https://dataverse.org/home.
The Odum Institute. 2018. “Management & Curation.” The Odum Institute. https://odum.unc.edu/archive/managementcuration/.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18.