Data Management might seem like a daunting task. Why bother? What is in it for a facility or individual researcher?
Here, we address the motivation to strategize your research data management (RDM), the main difficulties of RDM in imaging and why you should keep reading.
In recent years, it has been increasingly recognized that sharing original data enhances transparency, reproducibility, and scientific impact. This shift has prompted a growing international movement to make publicly funded research data accessible to all. At the same time, turning this goal into reality presents challenges for both researchers and infrastructure providers. From a facility perspective, having data management plans (DMPs) in place is essential. Facilities often support a large number of users with diverse experimental workflows, instrument requirements, and levels of experience. As a result, consistent data organization becomes critical to effectively navigate large and heterogeneous datasets. Streamlined workflows help reduce confusion, prevent duplication of effort, and optimize storage use—avoiding the accumulation of unnecessary or redundant data. From a researcher’s perspective, the idea of sharing data often raises concerns, particularly the fear of being scooped or losing control over unpublished results. It’s important to clarify: data does not need to be made publicly available the moment it is generated. Instead, the key is to ensure that data are well-organized and FAIR (i.e., Findable, Accessible, Interoperable, and Reusable) by the time of publication. This timing protects the scientific lead of the researcher while aligning with journal, funder, and institutional requirements. Well-organized data provides clear benefits for the research community, but also the individual researcher:
Various initiatives (listed in Table 1) have aimed at establishing or promoting open science, that is, making scientific results from publicly funded research accessible. Notably, the FAIR principles (Wilkinson et al., 2016) introduced a widely applicable framework for data management and data stewardship, , including of data that cannot be openly shared for valid reasons. Since then, many initiatives have promoted FAIR data by developing tools and standards to facilitate adherence to these principles across diverse scientific domains, including bioimaging. FAIR data enables reuse beyond the original context, boosting impact—as reflected, for example, by higher citation rates of publications associated with shared data—and thereby fostering trust and reliability for more effective and efficient scientific research. As a result, data producers and managers are required to ensure that data are described properly and can be shared via institutional or disciplinary repositories.
Research data management (RDM) has thus become an integral mission of imaging facilities (https://doi.org/10.1002/jemt.22648; https://doi.org/10.1002/mrd.22538; https://doi.org/10.7171/3fc1f5fe.97057772; https://doi.org/10.1111/jmi.13317). RDM is often defined as all the activities in the research process that are not data analysis or processing. Specifically, research data management refers to the organization and handling of research data throughout the research data lifecycle from data management planning and data creation through to storage, backup, retention, archiving, destruction, access, preservation, curation, dissemination (publication, sharing, reuse), documentation and description of data, including the complementary algorithms, code, software and workflows that support research data management practices.
Compared to other disciplines, bioimaging has only relatively recently joined the "big data" revolution—resulting in an exponential increase in demand for data storage and organization strategies. Bioimage data pose unique challenges. The diversity of techniques, growing number of instruments, acquisition systems, file formats, and metadata challenges potential data managers (Linkert et al. 2010). Additionally, the size, frequency, and high-dimensionality of acquisitions make even straight-forward experiments true “big-data” endeavors given the high volumes, variety and velocity of data created (Poger, Yen, and Braet 2023; Ouyang & Zimmer, Imaging Tsunami 2017). In addition, recent advances—such as automated imaging pipelines and the use of AI-driven image analysis—have increased data throughput while requiring more structured data input. Bioimaging is also increasingly combined with other data modalities, such as in spatial transcriptomics, creating further integration and metadata challenges (https://doi.org/10.1016/j.xgen.2023.100374).
Due to these fundamental issues and the rapid developments in, for example, omics and correlative and multimodal imaging techniques, image data management has often been perceived as daunting, leading it to be avoided, under-funded, or even considered impossible. Though challenging, the fact is that with the proper preparation, image data management can be valuable to all involved. Exactly those features that make image data difficult to work with — size, dimensionality, multi-modality — also make them valuable. The spatially correlated information is precisely difficult to summarize or compress because of its richness, and must be looked at in context. Fast visualization is key.
Fortunately, there is now a growing collection of strategies and efforts to support the different phases of image data management, from initial capture to reuse. The following chapters are intended to make core facility staff, with and without an IT background, aware of these advancements and help plan their rollout in the local setting. Though we will touch on issues at scale, especially under the section on the “Next level”, this is primarily intended more for beginner and intermediate practitioners. Future white papers will discuss the issues facing, and solutions available to, advanced users. Finally, though others related to the research data management including core IT services may find this checklist of interest, some familiarity is generally assumed with bioimaging.
The sections that follow outline the ten top suggestions from the Global Bioimaging Image Data working group, a collaborative international team of image data experts, for successfully implementing image data management. The steps build on one another, and should provide even novices the confidence to begin successfully building an image data management strategy from the ground up. This community is here to help. Whether you're starting from scratch or refining an existing workflow, we hope this guide offers practical insights, community-driven wisdom, and encouragement.
This project has been made possible in part by a grant from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation.