Online workshop
Research data management
Data producers
Introductory
Online / any
English
CESSDA
UK Data Service (UKDS)
Train the Trainer Workshop: Facilitating Onwards Sharing of Safe and Clean Microdata
This online hands-on workshop is aimed at trainers and support staff looking to introduce semi-automated tools in data management training sessions/materials. It will provide a platform for discussion around best practices for introducing researchers to key principles of assessing data quality and conducting statistical disclosure control (SDC) for quantitative data.
Data quality assurance and disclosure review are integral parts of research, data sharing, and preservation. Researchers collecting primary quantitative data can face several problems when preparing their data for onwards use as curating data is a complex process and a time-consuming task when done manually.
The workshop will present two open source tools that aim to ease the curation process for researchers:
- QAMyData, an open-source tool created by the UK Data Service that automatically assesses and reports on elements of data quality. It checks data for issues such as missingness, labelling and duplication, and looks for potentially disclosive information such as outliers and direct identifiers, providing a ‘health check’ for data and creating a data quality report.
- sdcMicro, an open source practical R package for disclosure review with a locally run GUI feature. It is a flexible and useful tool that provides detailed information on disclosure issues. It also enables a choice of potential solutions to fix them, based on a comparison between different statistical disclosure control methods.
Practical demonstrations and hands-on exercises will be used throughout the workshop. Participants are encouraged to have both tools downloaded on their PC/laptops if they would like to use them during the workshop however this is not compulsory. For sdcMicro both R and R Studio need to be installed.
The workshop’s objective is to enable trainers to introduce the main key concepts about numeric data quality assessment and disclosure control in training sessions, alongside hands-on practice with existing open source tools. Standard exercises and presentations used will be shared with the audience for future reuse and adaptation.
We will finish with a short session to gather your feedback for the day, including how you might wish to integrate these tools into your routine training sessions and materials.
Please note that the number of participants is limited.
Programme
09.30 - 09.45 Welcome
09.45 – 10.15 Presentation: Data Quality Assessment and QAMyData
10.15 – 10.30 Presentation: Lessons Learned - Teaching on Data Quality Assessment and Introducing Semi-Automated Curation Tools
10.30 – 10.45 Break
10.45 – 11.15 Exercise: Assessing Data Quality
11.15 – 12.00 Presentation: Assessing Disclosure Risk in Microdata
12.00 – 12.15 Presentation: Lessons Learned - Teaching on Statistical Disclosure Control and Associated Tools
12.15 – 12.30 Break
12.30 – 12.45 Demo: sdcMicro
12.45 – 13.15 Exercise: Statistical Disclosure Control
13.15 -13.30 Close and feedback
Trainer: Cristina Magder - Collections Development Manager, UK Data Archive, UK Data Service
Cristina manages the UK Data Service’s Data Collections Development and Research Data Management teams at the UK Data Archive, ensuring that key data are effectively identified, negotiated and appraised for research and teaching. She leads the research data management portfolio of support and training for the UK Data Service, with a particular focus on ensuring the successful operationalisation of the UKRI Research Data Policy for the Economic and Social Research Council. Her main teaching interests are data management planning, sharing and archiving data with a specific focus on data quality assurance and disclosure risk assessment, and reproducibility.
The material from this workshop (slides and exercises) can be downloaded here. The recording is also available on the CESSDA Training YouTube channel.