Skip to Main Content

ETD Processing Workflow

This LibGuide offers instruction and recommended practices on steps and tools for the generation and cleaning of downloaded ETD metadata en masse.

Processing and Creating Metadata for Electronic Theses and Dissertations

The following workflow is designed to facilitate batch metadata creation and upload of Electronic Theses and Dissertations (ETDs).  This workflow capitalizes on tools such as Oxygen XML Editor, OpenRefine, and PySAF to expedite and automate metadata creation, while maintaining accuracy of work.

The workflow is organized into the following stages:

  1. Start:  Download ETDs and organize files to set up efficient workflow;
  2. Extraction & Aggregation:  Use XSLT stylesheets to pull relevant metadata from XML files and collect that metadata in a single XML file on Oxygen XML Editor;
  3. Preliminary Clean (Mass Edits):  Import the XML of aggregated metadata into OpenRefine for mass edits and export this cleaned metadata as a Comma Separated Values (.csv) file;
  4. Final Clean (File Comparison):  Compare the .csv file metadata with the original PDF/XML files to verify accuracy of metadata and enforce controlled vocabulary practices;
  5. Upload to ShareOK:  Prepare an archival information package using PySAF and upload it to DSpace.