CPA2.2 Data Preservation: storage, curation and planning

Purpose

The main objective of this process area is to provide the services and functions that are necessary for the storage, maintenance and retrieval of preserved information. The functions and processes include transferring and adding the deposited information to permanent storage; managing, curating and planning for long-term storage; and performing routine and special error checking.

CC2.2: Capability completeness of data preservation

  1. Initial: one or more of the specific objectives and required activities are at an initial maturity level.
  2. Partial: only some of the specific objectives or required activities are met at a defined maturity level, or all specific objectives are met at a repeatable maturity level.
  3. Complete: all specific objectives and required activities are met at a defined maturity level or better.

SO2.2.1: Preservation actions

The main objective is to perform the necessary actions and functions for the storage, maintenance and retrieval of preserved information. Preservation actions include receiving data/metadata from Ingest and adding them to permanent storage, managing the information and metadata that are attached to the preserved data, performing routine and special error checking, and performing reviews and evaluation of the repository preservation actions.

RA2.2.1.1: Transfer to permanent storage

The repository has mechanisms and functions in place for transferring the deposited data into the archive for permanent (long-term) storage (e.g. move to separate storage volumes and/or migrate the deposited material to preferred formats, etc.).

(0) Not defined:

No transfer mechanisms in place; no functional differentiation between deposit (ingest) and storage.

(1) Initial:

The ingest and preservation/storage processes are distinguished as two separate functions; deposited material are transferred to storage, but there are no written processes descriptions or procedures in place; most material is transferred as-is.

(2) Repeated/partial:

Most material is transferred to a separate preservation function; there is a distinction between curation (i.e. what to preserve, maintain and add value to) and storage (i.e. simple bit preservation); prioritised data are migrated to the repository’s own preferred formats, and/or other non-proprietary/portable formats; processes are informal and there are no written procedures in place.

(3) Defined:

There are formalised processes and procedures in place for transferring data to permanent storage; roles and responsibilities are defined and connected to tasks; clear distinction between the different functions and mechanisms; procedures are available to users; explicit choices are made concerning data formats (open, portable, non-proprietary).

(4) Managed:

Tasks and processes are integrated into high level policies and objectives; there are regular reviews and updates of processes and procedures; staff training preservation procedures are in place; some automation may be in place.

(5) Optimised:

All tasks, processes and functions are monitored and measured; systemised reviews and updates of migration procedures based on technology watch and communication with, and outreach towards, Designated Community (e.g. preferred formats, etc.).

RA2.2.1.2: Persistent identifiers (PIDs) / locators

The repository have mechanisms in place to ensure that metadata are consistently associated with (connected to) the data over time. This is necessary to ensure that all the archived information can be located and retrieved.

(0) Not defined:

No system for persistent identification.

(1) Initial:

There is some awareness of the need for persistent identifiers and locators, but actions are sporadic and ad hoc; there are no formalised systems, processes or procedures in place.

(2) Repeated/partial:

Mechanisms and systems for identification and location are partly in place (e.g. there may be a certain directory structure or hierarchy to make the locating of data easier), but does not comply with formalised DOI systems; mechanisms are being repeatedly used, but there is lack of formalisation and written procedures.

(3) Defined:

There are mechanisms and systems in place to persistently identify and locate data and metadata (either by following external systems like DOI, or by internal PID systems); all processes and procedures are documented and formalised.

(4) Managed:

PID systems and locators are regularly reviewed and updated; mechanisms are aligned to higher level preservation goals and plans.

(5) Optimised:

All mechanisms and functions are monitored and measured; there are systemised reviews and updates of PID systems based on technology watch.

 

RA2.2.1.3: Backup and version control/change procedures

The repository has mechanisms and strategies for versioning/version control and backups.

(0) Not defined:

No versioning or backups.

(1) Initial:

Most material is stored as it was deposited; some deposited material is processed (through versioning) and backed up in separate volumes/drives; no formalised strategies or procedures in place.

(2) Repeated/partial:

Deposited material is backed up and/or made into several versions (e.g. deposited version, preserved version, corrected version, etc.), but versioning procedures are performed on an ad hoc and individual basis; formalised processes, procedures and documentation are lacking.

(3) Defined:

All deposited material is processed and safely stored through version controls and backups; all processes and procedures are defined and formalised in written documentation.

(4) Managed:

There are regular reviews and updates of backup and version control processes and procedures; procedures are aligned with preservation plans

(5) Optimised:

All procedures and functions are monitored and measured; there are systemised reviews and updates of backup and version control systems based on technology watch.

 

RA2.2.1.4: Authentication measures

The repository has authentication mechanisms in place to safeguard that personnel cannot make changes to the data stored or (unintentionally) delete (part of) digital objects.

(0) Not defined:

No preservation authentication mechanisms in place.

(1) Initial:

Some awareness of the issue; authentication mechanisms are unorganised and there are no formalised processes in place.

(2) Repeated/partial:

Some authentication and safeguard mechanisms are in place (e.g. access/edit rights are limited, roles and responsibilities are defined), but lacks formalisation.

(3) Defined:

Authentication measures and mechanisms are in place; formalised and defined in written documents; all roles and responsibilities are defined.

(4) Managed:

Regular reviews and updates of authentication mechanisms.

(5) Optimised:

All authentication and safeguard mechanisms are systematically reviewed and updated based on technology watch and is aligned to wider repository preservation planning.

SO2.2.2: Quality control of the data

 

RA2.2.2.1: Fixity checks

The repository performs checks to verify that a digital object (data and/or metadata) has not been altered or corrupted (i.e., fixity checks, checksums to confirm that all copies are identical, etc.).

(0) Not defined:

No awareness of the issue; no checks performed

(1) Initial:

Fixity checks are performed ad hoc and on an individual basis; processes and procedures are not formalised or defined.

(2) Repeated/partial:

Fixity checks are repeatedly performed but there is a lack of coordination and documentation of actions and processes.

(3) Defined:

Fixity checks are performed on all data; processes and procedures are documented and formalised; fixity checks are partly of fully automated.

(4) Managed:

Fixity checks routines are regularly reviewed and updated.

(5) Optimised:

Fixity checks are systematically reviewed and updated based on technology watch and preservation planning; high level of automation (fully automatised).

 

RA2.2.2.2: Error detection / unwanted changes

The repository has strategies / procedures for dealing with any errors detected during the integrity checks, and situations where unwanted changes to processed or stored data/metadata occur.

(0) Not defined:

No awareness; no strategies or procedures in place.

(1) Initial:

Procedures for dealing with errors or unwanted changes are performed ad hoc and on an individual basis; processes and procedures are not formalised or defined.

(2) Repeated/partial:

Procedures for dealing with errors or unwanted changes are repeatedly performed but there is a lack of coordination and documentation of actions and processes.

(3) Defined:

Processes and procedures for dealing with error detections and unwanted changes are documented and formalised; processes are partly of fully automated.

(4) Managed:

Processes and procedures for dealing with error detections and unwanted changes are regularly reviewed and updated; some automation may be in place.

(5) Optimised:

Processes and procedures are systematically reviewed and updated based on technology watch and preservation planning; high level of automation (fully automatised) where relevant.

RA2.2.2.3: Metadata management

The repository adds preservation metadata (or other administrative metadata) based on official metadata standards. Metadata are handled throughout the data lifecycle based on defined criteria to ensure that the relevance and understandability of data are maintained (for data users / Designated Community [maps to: Annex 2, section 1].

(0) Not defined:

No awareness; no metadata added.

(1) Initial:

Preservation metadata are added on an ad hoc basis; no formal metadata standards are being applied; no formal documentation of processes and procedures.

(2) Repeated/partial:

Routines for adding preservation metadata are repeatedly in use; metadata standards may be applied but here are no official statements or documentation of its use.

(3) Defined:

A written formal specification of routines for adding preservation metadata is explicitly defined (e.g. in a preservation policy); repository uses controlled vocabularies and metadata standards that are used and can be understood by Designated Community (e.g. DDI).

(4) Managed:

Usage of preservation metadata and metadata standards are regularly reviewed and updated; documentation and metadata processes and procedures are aligned with policies: regular reviews and assessments (of success) of preservation metadata routines.

(5) Optimised:

Regular reviews and updates of processes and procedures based on feedback from designated communities; and on monitoring of technology (format/standards) and designated communities.

RA2.2.2.4: Preservation policy

The repository uses a preservation policy to address and guide the data storage.

(0) Not defined:

Not applicable; no awareness

(1) Initial:

There is no preservation policy where data storage is addressed; data storage processes are performed on an ad hoc basis. The repository deals with the data storage issues on a case-by-case basis.

(2) Repeated/partial:

The preservation policy is not formally defined, but there are some repeatable procedures in place; data storage processes follow a regular pattern - they have developed to the stage where similar procedures are followed by different people undertaking the same task.

(3) Defined:

A preservation policy is defined and it is connected to specific processes and procedures.

(4) Managed:

The preservation policy is monitored and measured for compliance with processes and procedures; actions are taken where processes appear not to be working effectively or not to be in accordance with the policy.

(5) Optimised:

Processes and procedures are measured and assessed; processes, functions and mechanisms are under constant improvement and continuously integrated into the preservation policy.

SO2.2.3: Preservation planning

To have a functional entity which provides the services and functions for monitoring the environment of the archive/repository and which provides recommendations and preservation plans to ensure that the information stored in the archive/repository remains accessible to, and understandable by, and sufficiently usable by, the Designated Community over the Long Term.

RA2.2.3.1: Evaluation of content and preservation environment

The repository has mechanisms and functions in place to perform periodical evaluations of the contents and the general preservation environment of the archive. The evaluations can include risk analysis reports and the development of recommendations for archive processes, standards and policies.

(0) Not defined:

No evaluations of content.

(1) Initial:

Low awareness; repository is in the content build-up phase rather than the evaluative phase; any evaluations that are done are performed irregularly by individuals, on an ad hoc basis.

(2) Repeated/partial:

Repository has established its main (types of) content; partial evaluations are performed repeatedly, but there are no formalised evaluation processes or procedures in place.

(3) Defined:

Evaluations are formalised and performed at regular intervals; evaluations are performed in line with formalised processes and procedures (e.g. checkpoint lists, schedules, etc.); all processes and procedures refers to and are in line with strategies and policies; recognised tools may be applied regularly (e.g. PLATO and/or DRAMBORA).

(4) Managed:

Evaluation processes and procedures are regularly reviewed and updated, and are coordinated with other preservation planning activities; content and the general preservation environment are subject to measurements and quantifications where relevant; staff training evaluations are in place.

(5) Optimised:

Evaluative processes and procedures are subject to regular reviews and updates based on measurements and assessments; mechanisms in place for adopting results of technology watch; staff training routines are regularly reviewed and updated.

RA2.2.3.2: Monitor technology

The repository has functions and mechanisms in place responsible for tracking emerging digital technologies, information standards (data formats, metadata standards) and other relevant changes to software, hardware and best practices. This is to identify technologies and developments which could cause obsolescence in the repository’s preservation environment. A role within the organisation is is responsible for monitoring and analysing developments in technology.

(0) Not defined:

No monitoring.

(1) Initial:

Ad hoc or irregular monitoring; no systematic approach, some monitoring performed by individuals; no formalised reporting, some staff communicate what they have found.

(2) Repeated/partial:

Monitoring performed regularly, but are not formally defined into specific monitoring activities and systems; some reporting but lacks formalisation.

(3) Defined:

Feedback accomplished through periodical surveys, formal review processes, and/or via community workshops or other formalised meeting points; systems for registering feedback are in place; outputs of monitoring are formally reported. A role is responsible for gathering, monitoring and analysing changes in technology and best practices.

(4) Managed:

Coordinated with other preservation planning activities and higher level strategies; regular reviews and updates of monitoring strategies.

(5) Optimised:

Periodical reviews and evaluations of surveys, review processes and feedback mechanisms; there are mechanisms in place for reporting and integrating the evaluation results into higher level preservation strategies.

 

RA2.2.3.3: Preservation strategies

The repository has in place documented preservation strategies that are relevant to its holdings. Preservation strategies will describe how the repository will act upon identified risks, as part of the preservation strategic plan. These preservation strategies and the preservation strategic plan will typically address the degradation of storage media, the obsolescence of media drives, and the obsolescence or inadequacy of data/metadata (including formats) as the knowledge base of the Designated Community changes, and safeguards against accidental or intentional digital corruption [ISO 16363].

(0) Not defined:

No preservation strategies in place.

(1) Initial:

Informal, ad hoc ‘contingency action points’ are in place, but a full comprehensive strategy is lacking; action points are not formalised or connected to a policy.

(2) Repeated/partial:

Contingency action points have matured into a partial strategy; only partly formalised and documented.

(3) Defined:

Fully formalised and documented preservation strategy in place; connected to preservation policies and repository strategies, and to processes and procedures.

(4) Managed:

Strategy is periodically reviewed and updated.

(5) Optimised:

Usage of strategy is measured and assessed; processes, functions and mechanisms are under constant improvement and continuously integrated into the strategy and the higher level policies.