##viz Skip to Main Content

Managing Research Data: Past Workshops

Browse past Data Bytes workshop offerings below.

Software & Tools

Workshop Title (click for description) Instructor(s) Availability
Online Tools

Electronic lab notebooks (ELNs) are a data management software platform for organizing research records. At its simplest, an ELN replicates the function of a paper lab notebook but has several advantages over traditional paper records including the ability to make data secure and searchable. This workshop will explore those advantages and suggest two free options for adopting ELNs for your research. 

Dr. Kay Bjornen Last offered Fall 2021

LaTeX is a typesetting system for producing technical documents and is an important document standard in a number of disciplines. LaTeX gives authors more flexibility and control than other word processing software, but is less intuitive for first time users. This workshop will cover basic use of LaTeX editors, as well as an introduction to the markup language of LaTeX. Participants will gain a basic understanding of how to use LaTeX to create documents, how to write and format text, and how to format non-standard typesettings such as expressions, equations, and non-Latin scripts. We will use the free Overleaf LaTeX text editor, which does not require installing any software ahead of time. 

Clarke Iakovakis Last offered Spring 2023
See recording here

If you work with large spreadsheets and have been known to spend hours going cell by cell to find errors, we can show you how to save time and aggravation. OpenRefine is a powerful free tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. OpenRefine always keeps your data private on your own computer until YOU want to share or collaborate. Your private data never leaves your computer unless you want it to. (It works by running a small server on your computer and you use your web browser to interact with it). Participants will download OpenRefine to their own devices and learn by doing. 

Dr. Kay Bjornen & Clarke Iakovakis Last offered Spring 2021
See recording here

OpenRefine is a free, web-based tool for data cleaning and manipulation. This workshop will cover using OpenRefine with data in spreadsheets to look for and correct errors, relabel, separate or combine columns and export the cleaned data for use. This workshop will be a continuation of "Data Bytes - Data Cleaning with OpenRefine" but registration is open to all and experience is not required.  

Dr. Kay Bjornen & Clarke Iakovakis Last offered Spring 2021
See recording here

This workshop will explore digital tools for building timelines, including finding dates with Named Entity Recognition, visualizing time with a variety of freely available tools, and collaborating with other researchers, including students. 

Megan Macken Last offered Fall 2020

Expand your traditional Humanities or Social Science research using digital tools. Topics covered may include text analysis of archival documents and digital exhibits in Omeka. No prior experience with digital tools required! 

Megan Macken Last offered Spring 2021

An introduction to the basics of using EndNote, creating and adding references to an EndNote library, organizing and managing citations, annotating PDFs, using EndNote with Microsoft Word for in-text citations and bibliographies. 

Victor Baeza Last offered Spring 2023
Coding (R, Python, UNIX)

Tidy data is a data format that is consistent and machine readable. In this workshop we will explore what makes data "tidy" beginning with good practices for setting up spreadsheets using historical weather data from the OSU Agricultural Research Station that is being curated for use by the OSU Library. We will then explain why R is the right tool for data cleaning, visualization and analysis and discuss the advantages of learning how to use this powerful tool rather than be frustrated by the limitations of Excel. 

Dr. Kay Bjornen Last offered Fall 2021

Cleaning data, whether it is your own or someone else's, is usually the most difficult and painstaking part of data analysis. In spreadsheets it may involve going cell by cell to look for errors and inconsistencies followed by cutting and pasting which can also introduce errors. In this workshop, we will show you how to use R to clean, reformat and subset data quickly and without introducing the errors that Excel can be prone to. We will use real life messy weather data to demonstrate how to use the power of R to arrive at a clean and manageable data set. 

Clarke Iakovakis Last offered Fall 2021

Cleaning data, whether it is your own or someone else's, is usually the most difficult and painstaking part of data analysis. In spreadsheets it may involve going cell by cell to look for errors and inconsistencies followed by cutting and pasting which can also introduce errors. In this workshop, we will show you how to use R to clean, reformat and subset data quickly and without introducing the errors that Excel can be prone to. We will use real life messy weather data to demonstrate how to use the power of R to arrive at a clean and manageable data set. 

Dr. Phil Alderman Last offered Fall 2021

Visualization of data is easy to do in R. Graphs can be built layer by layer with as much detail as necessary using a variety of plot types, some which are not available in Excel. Once you have created your graphs and made them look professional, it is particularly convenient to be able to save the scripts and reproduce them with new data sets. The last of our data series will again use the OSU weather data to show how easy it is to make terrific looking visualizations in R. 

Dr. Kay Bjornen Last offered Fall 2021

This workshop will explore the use of R to create effective, professional looking data visualizations. Participants will explore the Tidyverse and ggplot packages through some hands on exercises. No prior knowledge of R is necessary but participants are encouraged to bring their own devices with RStudio loaded if they are R users. Library computers will be available for those without devices. This workshop is ideal for those who wish to practice skills learned through Carpentries workshops or who would like to learn more before registering for an upcoming Carpentries workshop. 

Clarke Iakovakis
(formerly taught by Dr. Kay Bjornen)
Last offered Fall 2022

R Markdown provides a way of combining text, equations, and code into documents that are easy to track and maintain, while producing reliable final documents for sharing with students, editors, and colleagues. R Markdown text documents can be used by the knitr R package and tools such as LaTeX and pandoc to produce a variety of beautiful outputs, including research papers, webpages, books, and presentations. This workshop will 1) introduce why R Markdown might be a productive tool for you to adopt, 2) demonstrate the basic capabilities of .Rmd files in the creation of different documents, and 3) share additional resources to learn more about R Markdown and associated tools. 

Dr. Peter Rudloff
(formerly taught by Dr. Kay Bjornen)
Last offered Spring 2023

An introduction to saving time and automating the boring stuff with Command Line. Part 1 of a two part series. 

Kevin Dyke Last offered Spring 2021
See recording here

Part 2 of a series - participants will write scripts to perform simple but repetitive tasks using the Command Line. 

Kevin Dyke Last offered Spring 2021
Other Tools

Data visualization can be a useful tool in the dissemination of your research. This workshop introduces data visualization, why it is useful, and how you can create charts and graphs from your data using tools such as Microsoft Excel and Google Sheets. We'll explore options for creating static and interactive graphics and introduce more advanced approaches. 

Kevin Dyke Last offered Spring 2019
See recording here

Speadsheets are the most popular tool for organizing data. Learn how to organize spreadsheets to make them machine readable and computation with tools such as R or Python more efficient. We will learn how to use some of the tools available in spreadsheets for quality assurance and data manipulation and about best practices for formatting and data entry. This workshop will not cover the use of spreadsheets for analysis. 

Dr. Kay Bjornen Last offered Spring 2021
See recording here

Maps and data are important tools for crafting effective messages. Kevin Dyke, Maps and Spatial Data Curator, will demonstrate tools and best practices for telling your story better no matter what your discipline or interest.Story mapping is a digital storytelling technique that combines multimedia, textual, and cartographic elements into a cohesive narrative. In this workshop we will explore the possibilities of such storytelling. Participants will get hands-on experience using ArcGIS StoryMaps. 

Kevin Dyke Last offered Spring 2021
See recording here

Jupyter Notebooks offer an appealing platform for developing course materials that can convey the traditional material, introduce students to the basics of coding in python, and present novel interactive, code-based aspects of the material. In this session, I will describe how I have used Jupyter Notebooks, compiled into an online Jupyter Book, to develop course materials for undergraduate chemistry courses. I will discuss the basic structure of the book and each notebook as well as the key elements for creating easy to use coding material for the students. After this session, attendees will understand how to develop their own Jupyter Notebooks for undergraduate courses. 

Dr. Martin McCullagh Last offered Fall 2022

This workshop will provide an overview of the many citation and engagement metrics, such as Journal Impact Factor, CiteScore, h-index, and altmetrics. Focus will be on appropriate use and interpretation of these metrics, including criticism and alternatives. 

Clarke Iakovakis Last offered Fall 2020

An increasing number of authors, funders, publishers, and other members of the scholarly communications infrastructure have adopted ORCID identifiers. Indeed, many publishers and funding agencies require individual researchers to be registered with ORCID iDs, and are developing systems to push and pull data from your ORCID profile. This session will describe what ORCID is and its benefits to you, how to sign up for an ORCID, and how to use it with systems such as SciENcv, eRA Commons, PubPeer, and many more. Finally, OSU faculty can link their Experts Directory profile to their ORCID iD, enabling them to read and write data both ways. 

Clarke Iakovakis Last offered Spring 2023

This workshop will focus on creating static websites with Jekyll. Jekyll is a software that creates a “static website” that combines templates with specific content to generate full HTML pages for site visitors. We’ll learn how to create websites and use GitHub to version control them. Jekyll is easy to customize and is ideal for writing content across multiple pages with similar templates. Content in Jekyll can be written either in HTML or in Markdown. In this workshop, we’ll set up a site using Jekyll and GitHub. A separate workshop on Markdown will follow later in the week for those who are interested in learning to write content for their sites. 

Dr. Brandon Katzir Last offered Fall 2022

Markdown is a way of formatting your writing for reading on the web: it’s a set of easy-to-remember symbols that show where text formatting should be added. For Jekyll in particular, Markdown means you can write webpages in a way that’s easier to learn than HTML. In this workshop, we’ll learn to create basic Markdown documents. If you attended the “Building Static Pages with Jekyll” workshop, this workshop will help you learn how to write pages in Markdown. But if you did not attend the Jekyll workshop, you can still benefit from learning this versatile markup language that is increasingly used on websites, documents, notes, and technical documentation. 

Dr. Brandon Katzir Last offered Fall 2022

ChatGPT and other generative artificial intelligence tools have made headlines in recent months. The user-friendly tool creates written content in response to any number of textual prompts, from generating poetry to writing research papers and even fabricating news articles. While we may be right to be wary about the implications of this technology on a liberal arts education centered on writing across the curriculum, generative AI is most likely here to stay. So, instead of viewing it primarily as a threat, how might educators and researchers embrace ChatGPT as a useful tool for instruction, capacity building, and equity? In this interactive session, you’ll learn the basic parameters and functions of ChatGPT as well as how to use generative AI in brainstorming and the creative process, how writing-heavy courses might integrate ChatGPT in ways that advance learning outcomes, how researchers might use generative text to build capacity, how ChatGPT might support equitable access for marginalized groups, and how an integrative approach to generative AI might forestall some of the threats posed by this new technology. 

Dr. Rosemary Avance, Dr. Heather Stewart,
& Richard Sylvestre
Last offered Spring 2023
See recording here

Data - Sources & Best Practices

Workshop Title (click for description) Instructor(s) Availability
Open Science & Reproducibility

Reproducibility for Everyone (R4E) is a community led education initiative to increase adoption of open research practices at scale. Research reproducibility is enhanced by good data management practices and transparent research methods. This workshop will include an introduction to reproducibility and how factors such as protocol, data and reagent sharing, data visualization, publishing choices and bioinformatics practices can all impact the reproducibility of your published research. 

Dr. Kay Bjornen Last offered Fall 2021

A research project is reproducible if the researcher provides a set of files and instructions to enable a second researcher (including you, the researcher) to recreate the final reported results. Over approximately the last decade, research funders and, increasingly, scholarly publishers have been implementing policies and supporting technologies to facilitate reproducibility. This workshop will define reproducibility, provide a history and overview of publishers' and funders' policies and practices aiming to encourage data and code sharing, discuss some obstacles and challenges, and describe innovations in workflows and collaborations between researchers, funders, and publishers driving reproducibility forward 

Clarke Iakovakis Last offered Spring 2023

Reproducibility is a term that is thrown around a lot currently in research. Many publications have referred to a reproducibility "crisis." But what exactly is it and why should researchers care? Come hear a discussion about the different aspects of research data reproducibility and what tools researchers have at their fingertips to improve it and to improve the transparency of their work. 

Dr. Kay Bjornen & Clarke Iakovakis Last offered Fall 2019

Are you tired of trying to keep track of multiple versions of your data and research documents, searching through a messy email inbox for communications, and lacking a central location for all files related to a research project? Open Science Framework (OSF) is a free, open source web application developed to help researchers of all disciplines (not just science) manage their workflows. OSF provides a highly customizable user interface, allowing users to create modules for housing data, research materials, communication, analysis files, and anything else they need to keep the ability to upload any file type under 5GB, with automatic, built-in version control for all files integration with tools researchers already use, such as Dropbox, Google Drive, and GitHub the means to add contributors with various levels of read/write permissions. 

Clarke Iakovakis Last offered Spring 2023
See recording here

Research data is a valuable resource. Careful management practices reduce mistakes, improve research reproducibility and facilitate publishing and sharing data. Better practices for different phases of the research data cycle will be discussed including planning for data collection, file management, data security, options for data storage during the project and long term storage of data for sharing after the project is complete. Planning for data management is also a critical part of developing a written data management plan, or DMP, for funding proposals. Please join us and learn these important skills in order to get your research project off to a great start. 

Clarke Iakovakis & Dani Kirsch
(formerly taught by Dr. Kay Bjornen)
Last offered Spring 2023
See recording here

Data Management Plans (DMPs) are an often overlooked tool for effectively planning how and where to store, describe and archive your research data so that it is secure and discoverable. Preparing a DMP in advance will assure that your next grant submission will be straightforward. In this workshop we will look at the components of a DMP required by federal funding agencies, use of dmptool.org to find templates for specific agencies and software and web-based tools that will make updating and modifying your DMP simple. Participants are encouraged to bring DMPs that they have used or are preparing for questions or group discussion. 

Dr. Kay Bjornen Last offered Spring 2022
See recording here

Inconsistent formatting, cryptic file naming, and poor folder organization can add unnecessary time and labor to a project, and it can be challenging (but not impossible!) to fix these issues once a project has already started. Using descriptive names and following a standardized organizational process will make your data and materials easier to understand, share, archive, and will help to meet FAIR guidelines. These guidelines seek to improve the Findability, Accessibility, Interoperability, and Reuse of data and are being increasingly promoted and required by funding agencies, publishers, and government agencies. This workshop will provide practical suggestions for naming and organizing files and folders as well as recommend better practices to comply with FAIR data principles. 

Dani Kirsch Last offered Spring 2023
See recording here

Data sharing is becoming an expectation for researchers, but how can you do this safely and responsibly? Figshare is a multi-disciplinary repository platform that offers researchers a free place to make research outputs, from data to supplementary material, available in a citable, shareable, and discoverable manner. Over 500,000 researchers from around the world use the platform to publish research outputs and track reuse. This workshop will cover how research outputs can be most effectively shared on the Figshare platform. You will leave confident in your ability to start sharing your research on Figshare and there will be plenty of time for questions.  

Andrew McKenna-Foster Last offered Spring 2022
See recording here
Where to Find Data

One of the major functions of the patent system is the dissemination of technical information. “Patent information is a valuable and comprehensive source of technical, commercial and legal information that can be used directly for scientific and experimental purposes…” Patents are primary source documents and can supplement the traditional literature searches in your disciplines: they can be used to discover new areas of research, for ideas to improve existing research, or to see if a product has already been developed. Learn about patents as intellectual property, how to determine the classifications or technology areas that correspond to your research, how to search patents and patent applications, and using patents as information sources. This workshop will make a comparison of patents in subject databases: Scopus, SciFinder, PubChem, Google Scholar, JSTOR, and Lens.  

Suzanne Reinman & Clarke Iakovakis Last offered Spring 2023

Learn how to access and work with the HathiTrust Extracted Features datasets, including Word Frequencies in English-Language Literature, 1700-1922 and Geographic Locations in English-Language Literature, 1701-2011. These datasets, derived from the full-text of over 17 million volumes, allow researchers to analyze a large body of both copyrighted and public-domain text through its volume-level metadata, page-level metadata, part-of-speech-tagged tokens, and token counts. No prior knowledge or experience with HathiTrust Research Center or text analysis necessary. 

Megan Macken Last offered Fall 2020

HathiTrust Digital Library contains nearly 17.5 million published works, including books, serials, and government documents. Learn how to access the secure computing environment for text analysis of the HathiTrust corpus, including in-copyright publications. 

Megan Macken Last offered Spring 2021

DATA NERDS! You know what Wikipedia is, but what about Wikidata? Find out more about this collaborative data source, how to contribute data, and how to access existing data for visualization, reuse, or the simple pleasures of geekdom. 

Megan Macken & Dr. Brandon Katzir Last offered Fall 2022

JSTOR offers millions of documents for text analysis in its new platform, Constellate. This workshop will provide an introduction to Constellate’s tutorials, tools, and datasets. Participants will create their own datasets, visualizations, and notebooks during the workshop. We will also briefly cover other tools from JSTOR Labs, including Text Analyzer and Juncture. 

Megan Macken & Dr. Brandon Katzir Last offered Fall 2022

Open access data and data sets published by U.S. government agencies are available to researchers in Data.gov and Science.gov, also the data catalogs that many agencies maintain. Data can be cleaned and standardized using OpenRefine and other software packages. Learn the key access points to federal data and how to improve data usability with OpenRefine. 

Suzanne Reinman & Kevin Dyke
(formerly taught by Tabitha Carr)
Last offered Fall 2022

In this workshop, learn how to build a data set from archival documents and historic photos. We'll look at example projects and quick methods for extracting data from archival collections. 

Megan Macken Last offered Spring 2021

Dr. Diego Mendez-Carbajo is a Senior Economic Education Specialist for the Research Division of the Federal Reserve Bank of St. Louis. He will provide a live demonstration of FRED, a free economic database that contains over 816,000 US and international time series, and will highlight topics related to diversity and inclusion. FRED is used by researchers, journalists, analysts, and data enthusiasts alike to visualize, save, and download data. This session will be a live demonstration of active learning with FRED data. We will show attendees how to use the flagship database from the Federal Reserve Bank of St. Louis to place abstract economics concepts in realistic and relevant contexts. Participants will learn how to use data to answer the following questions: What is the value of women's domestic labor that goes unpaid? How did the COVID-19-induced recession impact men and women employment? What fraction of the labor force do workers with a disability represent? During the presentation we will showcase how to leverage short reading assignments based on data featured in the FRED Blog. We will highlight topics related to diversity and inclusion and showcase the broad range of instructional resources produced by the Economic Education team at the St. Louis Fed: lesson plans, interactive modules, and micro-credentials All these resources are accessible to educators and the public at large at no cost. 

Dr. Diego Mendez-Carbajo Last offered Fall 2022

Remote sensing is the process of acquiring the physical characteristics of an object, area or a phenomenon without physical contact. This process is used generally to assess the changes on Earth and other planets. Increase in anthropogenic interactions with Earth’s land and other natural resources makes remote sensing methods important in the fields of Earth, agricultural and environmental sciences. This workshop aims to provide a brief introduction on remote sensing using satellites, followed by a hands-on session that involves using satellite imagery to assess the changes in agricultural land use.  

Dr. Abhiram Pamula Last offered Spring 2023
Handling Data    

The Expectation Maximization (EM) algorithm as implemented in Unsupervised Learning for optimizing untagged data clustering models will be explained in simple terms. An EM implementation running in Jupyter Notebook will be demoed live taking unlabeled data as input, performing the EM clustering logic, and displaying the resulting data clusters in colors. This is a light level statistics/probability session on how Unsupervised (data with no labels) clustering algorithms may be implemented, in this case applying statistical Expectation Maximization principles. Its great to have an idea on Normal/Gaussian Distribution and how multi-modal probabilities are calculated. A simple high level description will be given during the presentation. 

Amir Bahmanyari Last offered Spring 2022
See recording here

Dr. Cory Giles is a Postdoctoral Fellow in the Jonathan Wren Group at Oklahoma Medical Research Foundation with a focus on aging & bioinformatics. The title of his talk is "Techniques for organizing and managing large experimental datasets and metadata". The presentation introduces FAIR Data Principles and describes how implementation of these principles supports the practice of reproducible research. Federal funding agencies are strongly encouraging grantees to implement FAIR principles to advance reproducible research. 

Dr. Cory Giles Last offered Spring 2022
See recording here

In recent years, the use of data science and machine learning methodology has increased dramatically in domains such as engineering, physical sciences and social sciences. It is imperative that the next-generation of undergraduate and graduate students from these disciplines are introduced to the various ways in which large amounts of data are handled, visualized, and analyzed to glean important insights into physical and social phenomena. In this workshop, a high-level introduction to machine learning will be provided. Participants in the workshop will be exposed to online tools to aid in teaching and learning machine learning. A short hands-on example will be covered to demonstrate a pipeline for developing machine learning models. 

Dr. Jindal Shah Last offered Fall 2022

Machine learning techniques such as deep learning are becoming increasingly popular tools for surrogate modeling of complex problems in computational science and engineering. Such methods are broadly encompassed in the field of Scientific Machine Learning and are in their infancy. In this workshop we aim to provide the participants with a brief introduction to Scientific Machine Learning, followed by a hands-on session for implementing one of the techniques in generating a surrogate model for a dynamical system. In addition, Dr. Peetak Mitra an award-winning technologist and researcher primarily focused in the area of Scientific Machine Learning will be presenting a talk on challenges and opportunities in the field. 

Rohit Vuppala Last offered Spring 2023

Massive data sets are predominantly available for research and analysis purposes in our online era. Some of the innate characteristics of such big data can be summarized as variety, volume, and velocity. While the availability of large data sets could assist people in several ways, it is sometimes challenging to process such data and extract useful information with general purpose computing resources. This workshop aims to give a hands-on with Python programming language to pre-process large documents along with discussing some of the capabilities of distributed computing frameworks like Hadoop MapReduce to handle the big data. 

Dr. Arunkumar Bagavathi Last offered Spring 2023

On-Campus Resources

Workshop Title (click for description) Instructor(s) Availability

Research is more than just developing a project and compiling data. Beginning with your literature review and ending with communicating your results, there are rules and guidelines to be sure that you are working ethically. Dawn Underwood is the Assistant Vice President for Research Compliance at Oklahoma State University. She will explain research misconduct and discuss topics such as authorship disputes, plagiarism (including self-plagiarism), image modification and the importance of careful record-keeping. Come hear about research integrity, how to work with the OSU Office of Research to avoid misconduct and how to anonymously report it if you have concerns. 

Dr. Dawn Underwood Last offered Spring 2020

Call them what you like - unmanned air vehicles, uncrewed aircraft systems, or drones - autonomous aircraft open up new possibilities for gathering data, both through remote sensing and in-situ observations. This workshop will outline rules of the road for using drones, including both campus policies and federal regulations, as well as providing details of campus resources for building, modifying, and flying drones for research or other uses. Beginning with various applications, we will provide basic guidelines for operating safely and legally, then end with resources for students and faculty. 

Dr. Jamey Jacob Last offered Fall 2022
See recording here

No description. 

Dr. Jamey Jacob Last offered Spring 2023

During this session, we will look at how to develop content for VR and AR as well as how we can use some of these tools for academic research focusing on projects that the Mixed Reality Lab at Oklahoma State University is working on. 

Zahra Hosseini
(formerly taught by Dr. Tilanka Chandrasekara)
Last offered Spring 2023

This workshop is intended for participants who are interested in using supercomputers, including OSU’s Pete. The process of obtaining an account on OSU’s Pete will be described as a part of the workshop. Basic Linux concepts that are used for running programs, writing scripts and programs will be introduced. Taught in a hands-on format, this workshop will introduce how to use the command line, either on a local Linux workstation or through remote access with SSH. Scripting and programming languages will be briefly introduced and compiling and running programs will be discussed. Briefly, how to download and compile software from scratch will also be discussed. For users of Pete, a discussion about available software, using a queuing system such as SLURM will also be discussed. 

Dr. Pratul Agarwal Last offered Spring 2023

Just in time for the spring conference season, learn how to create research posters that communicate your findings without a wall of text. Identify your audience, use visuals effectively and build your storytelling skills to make your poster stand out from the crowd, then find out how to use the library's new poster printing service. 

Kevin Dyke Last offered Spring 2020

This session will demonstrate how to easily create videos to showcase your research and communicate with specific audiences, using simple off-the-shelf software. We will also discuss tips such as writing a script, preparing your physical space, and editing for brevity and clarity. 

Simon Ringsmuth Last offered Spring 2023