Skip to Main Content

Open Journal Systems (OJS)


Restoring backup content

OJS backups are stored as XML exports from the OJS Native XML Plugin. Raw XML dumps are not helpful to most editors who will want PDF, HTML, DOCX, etc. files of their uploaded articles.

Boundaries

Initiate this process when you need to extract original files from the XML dumps.

Outputs

This process produces a series of directories containing all the cover images and article files found within an OJS XML export. The script used in this process searches the XML files for journal issues and creates a directory for each issue, saving the corresponding cover images and article files within that directory. This process will likely create more files than you are expecting. OJS stores a unique copy of each article at each stage of the editorial process. Each article can have multiple files and versions of files associated with it. The script does not attempt to match files with articles. Since the Public Knowledge Project (makers of OJS) changes their XML schemas with OJS updates, I wrote the script to be as simple as possible. The script does match files with the appropriate issue.

Inputs

  • The Python script extract.py located here: https://github.com/okstate-library/OJS/blob/main/OJS/extract.py
  • Script requirements:
    • Python >=3.8.x (script tested in 3.8 and 3.9)
    • The following Python packages (these packages currently come with most standard Python distributions): xml, base64, glob, os, time, tkinter (optional), and itertools.
  • OJS XML files located in the shared drive: T:\Digitization\Open Journal Systems\ojs-dumps

Roles

This process can be completed by any employee withing DRDS who needs a specific article file or cover image from an OJS XML backup.

Activities

  1. Download the needed XML file(s) from T:\Digitization\Open Journal Systems\ojs-dumps. XML files are organized within sub-directories by the journal abbreviation listed in the OJS administration panel.
  2. Create a folder to hold the data and place all the XML files within that folder.
Windows
  1. Open the "Microsoft Store" application (Read prompts carefully. You do not need a Microsoft account to download software).
  2. Search for "Python3."
  3. Select Python 3.9 and click "Install."
  4. Once Python is installed, create a directory for output files from the script.
  5. Navigate to this script in the file explorer.
  6. Right-click on the script and select "Open with Python 3.9."
  7. Select/type the path to the directory containing your XML files.
  8. Select/type the path to your newly-created export directory.
  9. The script will automatically close once it finishes, waiting thirty seconds if there is an error message. Navigate to your export directory to find the extracted files.
Mac OSX
  1. Using Mac OSX, open a terminal and change directories to the ~/Downloads folder.
  2. Download and install Homebrew using the following commands:
    1. bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)".
    2. Follow the prompts to install Homebrew.
    3. Add a new line to the end of your ~/.profile configuration file:
      1. nano ~/.profile 
      2. Add this line to the end of the file: export PATH="/usr/local/opt/python/libexec/bin:$PATH"
      3. Type Ctrl + X to save, Y to confirm, and Enter to save the file.
    4. Restart the Terminal application.
  3. Install Python 3 using the command: brew install python.
  4. Optional: As of this writing, tkinter (optional Python package used by the script) does not come with this installation by default. To install tkinter, use the command brew install python-tk@3.9 (change the "3.9" to a different version if you are using a different version of Python). Having tkinter saves the user from having to type long file paths while using the script and reduces the chance of errors from user typos.
  5. Create a directory for holding output files from the script.
  6. In the terminal, change directories into the directory containing this script.
  7. Enter the command python3 extract.py.
  8. Select/type the path to the directory containing your XML files.
  9. Select/type the path to your newly-created export directory.
  10. The script will automatically close once it has completed, waiting thirty seconds if there is an error message.
  11. Navigate to your export directory to find the extracted files.
Linux
  1. Most Linux distributions come with the most recent version of Python by default.
  2. Create a directory for output files from the script.
  3. Open a terminal and change directories into the directory containing this script.
  4. Enter the command python3 extract.py.
  5. Select/type the path to the directory containing your XML files.
  6. Select/type the path to your newly-created export directory.
  7. The script will automatically close once it finishes, waiting thirty seconds if there is an error message.
  8. Navigate to your export directory to find the extracted files.