The amount of data available to researchers has been steadily growing. But even though large-scale, openly shared datasets give scientists across disciplines unique research opportunities, they also pose big challenges. Researchers at the Human Brain Project have proposed a new framework for the reproducible processing of large-scale data, which aims to tackle these challenges.
This great amount of data needs to be findable, accessible, interoperable and reusable – what is called the FAIR principles. One of the challenges is that the processing of biomedical datasets is rarely fully transparent. As they contain personal data, there are restrictions to their usage and distribution.
The proposed free and open-source framework attempts to reduce complexities of data processing. It employs a range of software tools for data, code and computational management to apply from software engineering to computational research.
Led by Michael Hanke, a group leader at the Institute for Brain and Behaviour at Forschungszentrum Juelich, the researchers demonstrated the scalability of the framework with analyses on one of the largest brain imaging datasets, the UK Biobank imaging data. Another showcase, using the studyforrest.org, highlights the framework’s performance on data sharing and transparency. The results were published in Nature Scientific Data.
Wagner, A.S., Waite, L.K., Wierzba, M. et al. FAIRly big: A framework for computationally reproducible processing of large-scale data. Sci Data 9, 80 (2022). https://doi.org/10.1038/s41597-022-01163-2