Skip to main content

HathiTrust Documentation

How to collect and format the data for submission to Hathi Trust
Tags: Hathi Trust, statistics

Single Part Monograph Instructions (SPM)

Part 1

  1.  Find and Run the logical Set, "HathiTrust_SPM_2018_ALL" to update the Set to include the current year.
  2.  Run the Job, "HathiTrust SPM Note" on the Set above.
    1. (Uses Drools File, "HathiTrustSPM_Create960")
  3.  Find and Run the logical Set, "HathiTrustSPM_ExportDataSet(ALL) DO NOT DELETE"

Part 2

  1.   In Alma, go to Resources -> Publishing Profiles
  2.  Find "Publish HathiTrustSPM_Data" and Edit
  3.  Make sure the Set name under "Content" is the same as in part 1.
  4.  All publishing parameters should remain the same:
    1.  Status = Active
    2.  Not Scheduled
    3.  Full
  5.  Publishing Profile should be "HathiTrust"
  6. Filename should be
    1.  Updated to the current year
    2.  Remain an .mrc (MARC) file
  7. Switch to the Data Enrichment tab
  8.  Double check that the normalization rule used is:
    1.  "HathiTrustSPM Export Data"
  9.  Save the Publishing Profile
  10. Run the Publishing Profile from the drop down menu on the right side.

Part 3

  1. Open Ipswitch FTP, logging into Hermes (bulkstill)
  2. Copy newly made file to T:\GeneralProjects\LABbarcodes\HathiTrust\On Server
  3. Send email to Ellen that this file is ready for final cleanup in MarcEdit

Serials Instructions (SER)

Part 1

  1.  Find and Run the logical Set, "HathiTrust_SER_2018_ALL" to update the Set to include the current year.
  2.  Run the Job, "HathiTrust SER Note" on the Set above.
    1. (Uses Drools File, "HathiTrustSER_Create960")
  3.  Find and Run the logical Set, "HathiTrustSER_ExportDataSet(ALL) DO NOT DELETE"

Part 2

  1.   In Alma, go to Resources -> Publishing Profiles
  2.  Find "Publish HathiTrustSER_Data" and Edit
  3.  Make sure the Set name under "Content" is the same as in part 1.
  4.  All publishing parameters should remain the same:
    1.  Status = Active
    2.  Not Scheduled
    3.  Full
  5.  Publishing Profile should be "HathiTrust"
  6. Filename should be
    1.  Updated to the current year
    2.  Remain an .mrc (MARC) file
  7. Switch to the Data Enrichment tab
  8.  Double check that the normalization rule used is:
    1.  "HathiTrustSER Export Data"
  9.  Save the Publishing Profile
  10. Run the Publishing Profile from the drop down menu on the right side.

Part 3

  1. Open Ipswitch FTP, logging into Hermes (bulkstill)
  2. Copy newly made file to T:\GeneralProjects\LABbarcodes\HathiTrust\On Server
  3. Send email to Ellen that this file is ready for final cleanup in MarcEdit

Multipart Instructions (MPM)

Part 1

  1.  Go to Alma Analytics
    1. Run report, "HathiTrustMPM_MMSId"
    2.  Export and Save report as .tsv (tab delimited file)
      1. Open Excel
      2. Start New File
      3. Import MPM text file using "import from file" menu item under Data tab
      4. Important! On 3rd step, make all columns use Text fields instead of General fields.
    3. Create an "All Titles" set in Alma using this Excel file saved as a .txt (tab delimited file)
  2.  Run the Job, "HathiTrust MPM Note" on the Set above.
    1. (Uses Drools File, "HathiTrustMPM_Create960")
  3.  Find and Run the logical Set, "HathiTrustMPM_ExportDataSet(ALL) DO NOT DELETE"

Part 2

  1.   In Alma, go to Resources -> Publishing Profiles
  2.  Find "Publish HathiTrustMPM_Data" and Edit
  3.  Make sure the Set name under "Content" is the same as in part 1.
  4.  All publishing parameters should remain the same:
    1.  Status = Active
    2.  Not Scheduled
    3.  Full
  5.  Publishing Profile should be "HathiTrust"
  6. Filename should be
    1.  Updated to the current year
    2.  Remain an .mrc (MARC) file
  7. Switch to the Data Enrichment tab
  8.  Double check that the normalization rule used is:
    1.  "HathiTrustMPM Export Data"
  9.  Save the Publishing Profile
  10. Run the Publishing Profile from the drop down menu on the right side.

Part 3

  1. Open Ipswitch FTP, logging into Hermes (bulkstill)
  2. Copy newly made file to T:\GeneralProjects\LABbarcodes\HathiTrust\On Server
  3. Send email to Ellen that this file is ready for final cleanup in MarcEdit

Creating Upload Files for HathiTrust with MarcEdit

Creating Upload Files for HathiTrust with MarcEdit

 

Files to be edited:   MPM = multi-part monographs (requires subfields 1-4, 5&7 if available)

                               SPM = single-part monographs (requires subfields 1-4,7 if available)  

   SER = serials (requires 1-2, 6 if available)

Hathi trust info:  https://www.hathitrust.org/print_holdings

Note: It’s always a good idea to make copies of your source files before doing any editing.

 

  1. Preparing the SPM file for Processing

Some entries in the SPM file may have duplicate subfields 1, 2, and 7 fields. These need to be removed before processing the file.

  • Double-click on the SPM mrc file to open the file in MARC Tools.
  • Select MarcBreaker and click the box next to Translate to UTF8, then hit Execute
  • Once complete, click on Edit Records
  • Click F9 (Edit Subfield Utility)
  • In Field enter “960”; in Subfield enter “1”
  • Check the box next to Delete Duplicate Subfield
  • Repeat for subfields 2 & 7
  • From the top pulldown menu, select File=>Compile File into MARC
  • When it’s done, close the file and choose “No” when it asks you if you want to save the changes

You’re now ready to proceed with all files.

 

  1. Set File Paths
  • Open MarcEdit and from the top pulldown menu select Tools=>Export=>Export Tab-delimited Records
  • For the first path, browse and select one of the mrc files
  • For the second path, browse to the save location and name the text file with the filename structure “okstate_TYPE_YEARMODA”, for example: 
    • okstate_multi-part_20180828.txt
    • okstate_single-part_20180828.txt
    • okstate_serial_20180828.txt
  • Leave the Select Field Delimiter as:  Tab (\t) and the In field delimiter as a comma (,)
  • Click on Next

 

 

  1. Define Fields/Subfields to Export
  • In Field, type 960
  • In Subfield, type 1
  • Click Add Field
  • Repeat for each subfield, in numerical order, processing files in this order

(you have to re-enter 960 each time you want to add a field, but the program keeps the added fields between exports, so you just have to add/delete rather than build from scratch)

  • For MPM: 1-4, 5&7 if available
  • For SPM:  1-4, 7 if available
  • For SER:  1-2, 6 if available
  • Click on Export

 

 

  1. Open the files to review. They will look something like this:
    • MPM

         

             

  • SPM

 

  • SER

  1. Delete any initial lines of text with “960” in the resulting file, then save and close.
  2. You can delete the copies you made of the initial files.