Selecting data to keep for long-term preservation is subjective and predicting what information may be required in the future is not a precise process.
It would be impractical to save all data at the end of a project. Before submitting data to a repository or considering long term archival, it is important to identify what is important to keep and what can be deleted without issue.
- Data must be easily discovered and usable. A large dataset with only a few useful bits of information isn’t as accessible or useful as carefully selected data stored effectively.
- The costs associated with storage and long-term archival are significant. Storing unnecessary data can be a waste of money.
- Any information stored mat be subject to Freedom of Information requests and the data disclosed.
- What is needed to validate findings in your publications?
- Are you obliged to destroy anything?
Selecting what to keep
The University of Nottingham highlights that when ascertaining what data to keep, consider the following questions:
- What are my funder and institutional requirements on what data to keep?
- Who holds the intellectual property and legal rights to this data in relation to storage and re-use? Can I negotiate these rights if it is not me?
- Is there sufficient metadata to enable future users to locate the data effectively?
- If the costs of storing the data are my responsibility can I afford it?
- Is the data transient or a ‘one off’ that cannot be replicated e.g. weather records?
Useful guide from DCC on how to decide what to keep - Five steps to decide what data to keep