From Raw to Reproducible: Versioning Research Data for the Future

Online
-

Speaker: Chase Núñez (Lib4RI)

In the fast-paced world of scientific research, where data is often vast and experiments must move quickly, managing and versioning data effectively can become a major challenge. This talk will provide practical, scalable, and sustainable strategies for versioning research data, balancing the need for speed with the imperative of long-term reproducibility and reusability. We will explore the core principles of Research Data Management (RDM), including how to preserve raw data, record transformations, and leverage tools like Git Large File Storage (LFS) for large datasets. Through real-world examples, we will demonstrate how to integrate version control into your everyday workflows, ensuring that your research remains accessible, reproducible, and ready for future discovery. Key topics will include how to maintain an immutable record of raw data, how to document data transformations, and the benefits of using institutional and external repositories for publishing datasets. By the end of the session, you will leave with a checklist, practical commands, and a clear strategy for improving the versioning of your own research data.

Mark your calendar Join online via Teams

CAPTCHA
1 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.