Classically, bioimaging data has been considered difficult if not impossible to manage and share. In this whitepaper, we would like to share our experiences along with guidelines and other tips for successfully beginning the “image data management marathon”.
The demand to make publicly funded research available and accessible has increased in recent years. Various initiatives (listed below) have aimed to promote or establish open science, open access to research data or findability, accessibility, interoperability and reusability of research data (the FAIR principles) (Wilkinson et al. 2016). As a result, data producers and managers are required to ensure that data are described properly and can be shared via institutional or disciplinary repositories. Research data management (RDM) has thus become an integral mission of imaging facilities. RDM is often defined as all the activities in the research process that are not data analysis or processing. Specifically, research data management refers to the organization and handling of research data throughout the research data lifecycle from data management planning and data creation through to storage, backup, retention, archiving, destruction, access, preservation, curation, dissemination (publication, sharing, reuse), documentation and description of data, including the complementary algorithms, code, software and workflows that support research data management practices. [CODATA, OECD]
There is a growing expectation from funders and the broader scientific community that all research data, when possible, should be made openly available. At the same time, however, sharing bioimage data with their heterogeneity of techniques, acquisition systems, formats, and metadata challenges potential data managers (Linkert et al. 2010). Additionally, the size, frequency, and high-dimensionality of acquisitions make even straight-forward experiments true “big-data” endeavors given the high volumes, variety and velocity of data created (Poger, Yen, and Braet 2023). Finally the need to integrate image data with other data types, for example in spatial transcriptomics, adds further complexity (https://doi.org/10.1016/j.xgen.2023.100374).
Due to these fundamental issues and the rapid developments in, for example, omics and correlative and multimodal imaging techniques, image data management has often been perceived as daunting, again like a marathon, leading it to be avoided, under-funded, or even considered impossible. Though challenging, the fact is that with the proper preparation, the marathon that is image data management can be valuable to all involved. Exactly those features that make image data difficult to work with — size, dimensionality, multi-modality — also make them valuable. The spatially correlated information is precisely difficult to summarize or compress because of its richness, and must be looked at in context. Fast visualization is key.
Luckily for all involved, there is now a growing collection of strategies and efforts to support the different phases of image data management, from initial capture to reuse. This white paper is intended to make core facility staff, with and without an IT background, aware of these advancements and help plan their rollout in the local setting. Though we will touch on issues at scale, especially under the section on the “Next level”, this is primarily intended more for beginner and intermediate practitioners. Future white papers will discuss the issues facing, and solutions available to, advanced users. Finally, though others related to the research data management including core IT services may find this checklist of interest, some familiarity is generally assumed with bioimaging.
The sections that follow outline the ten top suggestions from the Global Bioimaging Image Data working group (EXPLAIN, possibly in an introductory paragraph about the authors?) for successfully achieving image data management. The steps build on one another, and should provide even beginners (CHECK AUDIENCE) the confidence to begin managing and sharing their image data. … Our goal is to help you to successfully build an image data management strategy from the ground up. For those with existing strategies, hopefully there are a few lessons to be learned.
This project has been made possible in part by a grant from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.