Publishing with CESSDA archives

CoreTrustSeal800px

For high-quality data with a potential for reuse, we recommend you to assure long-term access by publishing your data with a trusted repository, like many of the CESSDA archives. CESSDA archives aim to make the research data accessible with as few restrictions as possible, while at the same time protecting (sensitive) personal data from inappropriate access.

CESSDA archives per country

In the image map below you can see for yourself whether a CESSDA archive is available as a trusted home for your datasets in your country. If you decide to publish your data to one of the CESSDA archives you will have to invest some time and effort to prepare your data. If research data management is a vital part of your work, then the majority of work has already been done on your way.

Watch this video to discover what a CESSDA data archivist is, what archivists do, and why their work is so important now and for the future. The video is also available on Zenodo.

Please cite this video as: CESSDA Training Team. (2020). First steps towards data curation - A Day in the Life of a Data Archivist (2nd part) [Video]. Zenodo. http://doi.org/10.5281/zenodo.4569366

Added benefits of a CESSDA repository

As opposed to self-archiving your dataset, publishing your dataset at a CESSDA archive has a great advantage of having expert help within reach. CESSDA research data management experts can help you to increase the comprehensibility, visibility, findability, reusability, longevity and the overall quality of your datasets in the following ways:

Metadata600px

At a CESSDA archive, you can deposit your data with the help of a data expert. This expert will advise you on what information is needed to understand your data. Ensuring that your metadata are as rich and complete as possible helps in making sure your data meet the F (Findability) and I (interoperability) in FAIR data management.

In general, you will have to provide the following metadata and data documentation when publishing your data with a CESSDA data archive:*

  • Title (Original)
  • Subtitle (Original)
  • Responsibility / Authors (All authors of the study)
  • Other co-workers and their roles (Person, research group or organization that participated in the study. Write down what were their roles)
  • Funding agency/Sponsor (Name of the funding institution)

Short abstract (max. 200-300 words): starting point, purpose, and objectives of the research and the main problems addressed in the research. If the data file is part of an international project, please give some information about it.

The time period of collecting data in the field, e.g. October 1999 - November 1999.

Target time period to which the data relate, e.g. Second World War.

Specify the country or countries covered in the data file, e.g. Slovenia.

Individual, Household, Organization / Company, Other. Please specify: .......

Description of target population covered in data file, e.g. Included: The adult residents of Slovenia, older than 18 years, living at a permanent address. Excluded: People living in a household without telephone and institutionalised people.

  • Numeric
  • Text
  • Still
  • Image
  • Geospatial
  • Audio
  • Video
  • Software
  • Interactive resource
  • 3D
  • Other

Person or institution responsible for collecting data.

  • Longitudinal
  • Longitudinal: Cohort/Event-based
  • Longitudinal: Trend/Repeated cross-section
  • Longitudinal: Panel
  • Longitudinal: Panel: Continuous
  • Longitudinal: Panel: Interval
  • Time series
  • Time series: Continuous
  • Time series: Discrete
  • Cross-section
  • Cross-section ad-hoc follow-up
  • Other
  • Total universe/Complete enumeration
  • Probability
  • Probability: Simple random
  • Probability: Systematic random
  • Probability: Stratified
  • Probability: Stratified: Proportional
  • Probability: Stratified: Disproportional
  • Probability: Cluster
  • Probability: Cluster: Simple random
  • Probability: Cluster: Stratified random
  • Probability: Multistage
  • Non-probability
  • Non-probability: Availability
  • Non-probability: Purposive
  • Non-probability: Quota
  • Non-probability: Respondent-assisted
  • Mixed probability and non-probability
  • Other
  • Interview
  • Face-to-face interview
  • Face-to-face interview: CAPI/CAMI
  • Face-to-face interview: PAPI
  • Telephone interview
  • Telephone interview: CATI
  • E-mail interview
  • Web-based interview
  • Self-administered questionnaire
  • Self-administered questionnaire: E-mail
  • Self-administered questionnaire: Paper
  • Self-administered questionnaire: SMS/MMS
  • Self-administered questionnaire: Web-based
  • Self-administered questionnaire: Computer-assisted (CASI)
  • Focus group
  • Face-to-face focus group
  • Telephone focus group
  • Online focus group
  • Self-administered writings and/or diaries
  • Self-administered writings and/or diaries: E-mail
  • Self-administered writings and/or diaries: Paper
  • Self-administered writings and/or diaries: Web-based
  • Observation
  • Field observation
  • Participant field observation
  • Non-participant field observation
  • Laboratory observation
  • Participant laboratory observation
  • Non-participant laboratory observation
  • Computer-based observation
  • Experiment
  • Laboratory experiment
  • Field/Intervention experiment
  • Web-based experiment
  • Recording
  • Content coding
  • Transcription
  • Compilation/Synthesis
  • Summary
  • Aggregation
  • Simulation
  • Measurements and tests
  • Educational measurements and tests
  • Physical measurements and tests
  • Psychological measurements and tests
  • Other
  • Questionnaire
  • Structured questionnaire
  • Semi-structured questionnaire
  • Unstructured questionnaire
  • Interview scheme and/or themes
  • Data collection guidelines
  • Data collection guidelines: Observation guide
  • Data collection guidelines: Discussion guide
  • Data collection guidelines: Self-administered writings guide
  • Data collection guidelines: Secondary data collection guide
  • Participant tasks
  • Technical instrument(s)
  • Programming script
  • Other

What was response rate in the survey?

  • Number of units in the sample
  • Number of completed interviews (realized sample) (IP)
  • Number of refused interviews (R)
  • Number of not contacted respondents (NC)
  • Other (O, unknown or inadequate units)

For more about the disposition codes and the calculations of the Response Rates see American Association of Public Opinion Research (AAPOR. n.d.).

Documentation (copies of):

  • Questionnaire
  • Informed consent
  • Protocols
  • Vignettes
  • Codebook
  • Frequency statistics for all variables
  • Other

Publications (copies of):

  • Report
  • Scientific article
  • Other
  • Confidentiality and data protection assurance (e.g. anonymisation)
  • Copyright
  • Access conditions to assure confidentiality and data protection.
  • File name
  • File format
  • Number of variables
  • Number of units
  • Total size of the file

* To ensure findability and interoperability, CESSDA archives work towards standardisation of metadata. DDI and CESSDA Controlled Vocabularies are used for many of the fields mentioned above. Before you start filling in the fields above, contact your data archive because they might have slightly different requirements.

Recognition1200px

When you publish your data in a CESSDA archive your data become more visible in several ways:

By applying a persistent identifier to your datasets your data can always be found and cited (see 'Data citation').

Scientific credits
You may get scientific credits for a data publication. E.g., in Slovenia, publishing research data in a data archive approved by the Research Funding Agency may lead to gaining the status of a scientific publication. "To reach this status the research study and data in question should meet the following conditions (see ' Study Classification in the ADP' (ADP, 2017a)):

  • The study should have scientific and methodological excellence;
  • Relevance is shown for reuse for a wide arrange of practical and theoretical problems;
  • The data collection has to be a result of a concluded study;
  • The data collection must fulfil high criteria of quality that are ascertained on the basis of extensive accompanying documentation;
  • The data collection needs to be publicly accessible in a national or international scientific data archive like the Slovenian Social Science Data Archives (ADP, 2017a);
  • The data collection needs to be documented and accessible in a form that enables repetition of scientifically published findings, conducted on the basis of the data collection.

CESSDA archives are likely to promote your datasets once they are deposited. See 'Promoting reuse' for examples.

AccessSharing600px

With a combination of data licensing (see 'Data licenses') and access categories (see 'Access categories') CESSDA data archives can control the exact level of access and permitted reuse. In this way, you can make the optimal choice to enhance the re-use potential of your research data whilst simultaneously protecting your participants' identities.

FileFormats800px

Experts at CESSDA archives add to the longevity of your datasets in the following ways:

  • They give advice on the best file formats for long-term preservation;
  • They offer expertise and services to convert data to new formats (See 'File formats')
  • They add value to the data, for instance by new functionality to query the data.
QualCheck800pc

In several CESSDA archives, an expert will review the quality your data by judging e.g. the content of the study, methodology, relevance, legal consistency and documentation of materials.

You can have a look at the European diversity in such quality checks by CESSDA archives if you open the accordion.

CESSDA Archive

Quality check

Notes

Austria (AUSSDA)

YES

  • We compare data and documentation (including checks if value labels and variable names match in dataset and documentation);
  • We check for completeness of data and labels and we conduct plausibility checks.

Czech Republic (CSDA)

YES

Only a basic quality check (CSDA, 2016) is performed:

  • Check that variables in the dataset correspond to a questionnaire;
  • Basic control of frequencies of variables and values of variables (to find values that are out of given range).
  • A check of variable and value labels to see whether they correspond to labels in a questionnaire.

No elaborate analysis of data quality (like checking construct validity, record check studies etc.) is done.

Greece (So.Da.Net)

YES

Basic quality checks to ensure the completeness and the understanding of any deposited data, as follows:

  • Dataset dimension checks: the number of cases and variables are checked against the documentation;
  • Metadata checks: all variables should have variable labels and all categorical variables should have value labels / the dataset must be comprehensible in association with the documentation given to users;
  • Data validity checks:
    • All categorical variables must be checked for out-of-range values/wild codes;
    • Possible interval variables must be checked for improbable or impossible value.

Netherlands (DANS)

YES

DANS performs data quality checks of the deposited data and metadata. DANS provides specific instructions for depositing social science data on their website, including an overview of the data requirements which will be checked during the process. See the 'Depositing information' (DANS, n.d.b) and a detailed list of the data quality checks (DANS, n.d.c).

Norway (NSD)

YES

All data that are deposited are reviewed and processed, for instance, all data are checked with respect to anonymity. See the information on the website of the Norwegian Center for Research Data (NSD, n.d.a).

Slovenia (ADP)

YES

Detailed quality checks to ensure the completeness and the understanding of any deposited data and documentation, as follows:

  • Dataset dimension checks: the number of cases and variables are checked against the documentation;
  • Metadata checks: all variables should have variable labels and all categorical variables should have value labels / the dataset must be comprehensible in association with the documentation given to users;
  • Data validity checks:
    • All categorical variables must be checked for out-of-range values/wild codes;
    • Possible interval variables must be checked for improbable or impossible values - dataset being checked for consistency with/against published results (e.g. report, journal article…);
    • Research material being checked for legal consistency (e.g. GDPR, IPR).

Sweden (SND)

NO

SND controls that the data is readable and that all data is deposited. No quality control is performed on the content apart from what is provided as metadata.

Switzerland (FORS)

YES

FORS performs quality assurance routine checks for completeness, integrity, comprehensibility, and validation of the data files:

  • Ensuring completeness and comprehensibility of documentation and data, e.g. number of cases and variables, variable names, question-variable links;
  • Checking variable and value labels;
  • Controls for adequate anonymisation of the data;
  • Verifying treatment of missing values;
  • Checking and enhancing the metadata if necessary;
  • Conversion into suitable archival storage and dissemination formats.

FORS also provides guidelines for documenting and preparing quantitative and qualitative data (FORS, n.d.) for deposit.

United Kingdom (UK Data Service)

 

YES

UK Data Service does quality checks of the deposited data and metadata. For self-deposited research (ReShare repository, UK Data Service, 2017a) we check (UK Data Service, 2017b) for disclosure risk, copyright breaches, the validity of file formats and level of documentation. For curated data (large surveys) we review and process the data (UK Data Service, 2017c).

Do you want to dive in deeper?

For data licensing, data citation and data access we have prepared additional information. Click the item of your choice to proceed or just click next to visit all.