Digital Preservation

| Forms | Policies & Procedures | Usage Policy | Archives Online Workroom |
Digital Preservation Banner

Policy

The Valdosta State University Archives and Special Collections is committed to the responsible and sustainable management of digital assets.

  1. Digital Preservation is a new field where best practices and international standards are still evolving. The Valdosta State University Archives and Special Collections long-term digital preservation policy is based on the Open Archival Information System (OAIS) reference model ISO 14721:2003.
  2. Every effort will be utilized to preserved digital formats submitted to the Valdosta State University Archives and Special Collections. When submitting, we recommend following our recommended file formats guide to facilitate long-term preservation.
  3. The Valdosta State University Archives and Special Collections will provide long-term access to submitted works, in addition to associated descriptive and administrative metadata, by a combined strategy combining the following:
  • Secure Backups
  • Storage Media Refreshment (Copying data from one storage medium to another)
  • File Format Migration

Presently the Valdosta State University Archives and Special Collections is committed to preserving the bitstream (binary data). Functionality and appearance will be maintained as resources permit.

  1. Works submitted to the Valdosta State University Archives and Special Collections will be assigned a persistent URL if applicable.
  2. This policy and other preservation related activities will be reviewed annually to ensure best practices and techniques are adopted as technology and institutional practices evolve.

 

File Naming Conventions

The Valdosta State University Archives and Special Collections recommend the following guidelines for File naming.

Rule 1. Do not use special characters in the file name. Including: /:\*?”<>|(){}[]&,.$, etc…

The characters above are used by various electronic environments. For example, a forward slash identifies folder levels in Microsoft products, while macs use the colon. Periods are used to  denote file formats. Avoid using these characters which could result in file lost or errors.

Rule 2. Use underscores and hyphens instead of spaces.

Spaces are displayed in web environments as %20 and can create broken links and errors as a result. Use underscores to separate themes, descriptions, and other fields in a file name. Use hyphens instead of spaces to link words together.  Examples:  VSU File Naming Guide.docx becomes VSU%20 File%20Naming%20Guide.docx. Instead try: vsu_file-naming-guide.docx. 

Rule 3. Use lower case letters only.

Some operating systems and electronic environments distinguish between upper and lower case letters. As a precaution, use only lower case letters.

Rule 4. File names should be short. Err on the side of brevity.

Long file names are not compatible with older legacy systems. For example, files on a standard CD-ROM can have a maximum of eight characters, including the extension.  Additionally, when transferring data into another directory, the entire file path is counted towards the file name. With exceptionally long names, deep in a directory, files cannot be moved or transferred if they exceed 256 characters, including their file path. Generally, 25 characters should be sufficient. Remember, filenames do not have to be overly descriptive, that is what metadata is for.

Note: Always use a three-letter extension when applicable. For example .tiff should be .tif.

Rule 5. File names should be unique to an object*.

When objects are copied from their directory they should be identifiable on their own merits. For example if the tiff files under ms54//folklife/turpentine/images/0001.tif and ms899/photos/france/0001.tif were pulled from their directories they would appear identical and could prompt an overwrite. The correct way to name these files would be ms899_photos_germany_0001.tif, or an alpha-numeric code representing these fields. [*note: Digital objects can be made up of multiple files. See below.]

Records with Multiple Format Versions Multiple file formats of the same object should have the same name. For example, pc1928_p082.tif, pc1928_p082.pdf, and pc1928_p082.txt would all represent the same digital object. The first file would be the master tif, a pdf for access and usage, and a plain text document with the ocr data written in UTF-8 encoding.

If a single item has multiple digital objects, for example the front and back of a photograph, use a alpha-numeric system to distinguish linked objects. For example ms899-f101_001a.tif and ms899-f101_001b.tif.

Rule 6.  Include dates and format them consistently.

Use the international standard – ISO 8601 to display dates. Either YYYY-MM-DD or YYYYMMDD. This format allows for easy sorting and can distinguish different versions of the same record, for example between a draft and a finalized document. You can add the date to the filename itself, or just the metadata. If you are unsure of the specific date use January 1 as the default.

Rule 7.  Be Consistent.

Abbreviations, date formats, and other key info should be agreed upon and utilized by all digital asset creators within a department. An easy way to keep track of this is to simply create "readme.txt" documents in folders explaining the naming format used- if it is unclear.

Rule 8.  Metadata

The Valdosta State University Archives and Special Collections highly recommends embedding metadata into your files. This can easily be done by right-clicking on a file in Windows and selecting “properties” and entering information into the fields and applying. Your can select multiple files at once to speed up the process. Additionally, in most file creation software, “properties” can be found under the “file” menu.  A minimum of author, title, date, and subject are desired.


 

Subjects and Keywords

The Valdosta State University Archives and Special Collections recommends following the Library of Congress Subject Heading list (LCSH) when possible. For photographs, the IPTC Metadata Taxonomies for News is recommended in addition to LCSH. For Geographical locations the Getty Thesaurus of Geographic Names (TGN) schema should be followed.  Academic and University related keywords can be found in the Thesaurus for Use in College University Archives. Abbreviations can be found at http://www.abbreviations.com/.


 

File Formats

File Format Guidelines
Type Access Copies
Archival Masters
Textual .pdf, .docx, .csv, .xml, .html, .epub, .mobi
.pdf (pdf/a), .odt, .ods, .odp, .xml, .txt (UTF-8),  .oebps
Images .jpg, .png, .svg
.tif (Revision 6.0) (LZW Lossless Compression)
Audio .mp3 (320 Kbs, 16-bit) .wav (uncompressed)
Video .mp4 (H.264), .avi, .mov, .mpeg2, . mpeg4 (AAC Audio Encoding)

Sustainability of Digital Formats Planning for Library of Congress

Textual

  • PDF/A is the preferred archival format for the long term preservation of text-based digital objects. 
  • CSV (and TSV) is used to import/export metadata into databases.
  • Open Office Formats should be used in addition to PDF/A. Proprietary file formats like Microsoft Word or PowerPoint should be normalized by Archive’s staff into open office formats.
  • Text documents should be used for inventory sheets, finding aids, OCR output, and notes. Text documents are not secure, but they promise long term accessibility better than any other format.
  • XML documents will be used to manage and export metadata for textual objects with structural markup. XML allows extensive customization so that elements from Dublin Core, PREMIS, MODS, EAD, and METs can all be incorporated into this structure.

Audio

  • Audio preferences: 24 bit, high KHz, Linear PCM, High data rate (320kbs). Encode surround sound only if essential to creator/content.

Video

Bitstream encoding for video (relates to clarity and fidelity)

• Larger picture size preferred over smaller picture size. Picture size is expressed as horizontal lines and number of samples per line, or as horizontal and vertical pixel counts.

• Content from high definition sources preferred over content from standard definition, assuming picture size is equal or greater.

• Higher bit rate (often expressed as kilobits or megabits per second) preferred over lower for same compression scheme.

• Surround sound encoding only necessary if essential to creator's intent. In other cases, stereo or monaural sound is preferred.

Normalization and Migration

The Valdosta State University Archives and Special Collections will normalize digital assets submitted to them into the formats above if they are not already. Originals will be kept and tagged for hash checks. If these formats depreciate in usage the Archives will migrate to newer file formats with as little visual impact as possible.

File Compression

ZIP Files: This format is designed for cross-platform data exchange and efficient data storage for a set of related files. ZIP_PK is a de facto industry standard, developed, maintained, and openly documented by PKWARE. The original version of the format was developed by Phil Katz (hence the "PK" in PKWARE). ZIP_PK combines data compression, file management, and data encryption within a portable archive format. ZIP_PK has been used as a packaging or container format in other format specifications.  You can use zip files to:

  • Group files together into a single unit
  • Encryption (password protect sensitive information)
  • CRC-32 Validation (Verify the data is intact and uncorrupted)
  • Software: 7zip, WinRAR   - Windows 7 & 8 has built a built in file compression, (How? Right-click on a file or folder, select "Send to -> Compressed (zipped) folder." )
  • Note: Use LZMA (lossless) Compression or Deflate, when creating a zip. (Windows does so by default.)

 


Sources

Northeast Document Conservation Center, "NEDCC Digital Preservation Policy Template"

University of Colorado Digital Library Metadata Best Practices, Version 1.0

University of Texas Libraries, "Digital Repository Preservation Policy."

Library of Congress, "Perspectives on Personal Digital Archiving: National Digital Information and Preservation Program," 2013.

The Library of Congress: DPOE - www.digitalpreservation.gov