Skip to main content

ETD Processing Workflow

This LibGuide offers instruction and recommended practices on steps and tools for the generation and cleaning of downloaded ETD metadata en masse.

Processing and Creating Metadata for Electronic Theses and Dissertations

The following workflow is designed to facilitate batch metadata creation for Electronic Theses and Dissertations (ETDs).  This workflow capitalizes on tools such as Oxygen XML Editor and OpenRefine to expedite and automate metadata creation, while maintaining accuracy of work.

The workflow is organized into the following stages:

  1. Start:  Download ETDs and organize files to set up efficient workflow;
  2. Extraction & Aggregation:  Use XSLT stylesheets to pull relevant metadata from XML files and collect that metadata in a single XML file on Oxygen XML Editor;
  3. Preliminary Clean (Mass Edits):  Import the XML of aggregated metadata into OpenRefine for mass edits and export this cleaned metadata as a Comma Separated Values (.csv) file;
  4. Final Clean (File Comparison):  Compare the .csv file metadata with the original PDF/XML files to verify accuracy of metadata and enforce controlled vocabulary practices;
  5. Upload to ShareOK:  The finalized file is submitted for final review and online publication on DSpace.