Skip to Main Content

Finding and Evaluating Data: Data Basics

Data Basics

  1. Clearly define the question or issue you are investigating.
    1. What is your topic and why is it important?
    2. How can you think about your topic empirically? What do you want to try to show or learn?
    3. What kind of data do you want to find? Format? Content? Amount?
    4. Note: not all data exists or is available. If you can't find relevant data, talk to your subject librarian or advisor for assistance.
  2. Conduct a background information search.
    1. This will show you what research has already been done.
    2. You can see what sorts of terms are used by experts in the field, which will help with your search for data.
    3. You can look at methodologies used by others in their analysis.
    4. You may find a relevant data set in an article.
  3. Identify possible sources of data.
    1. Data must be created: recorded, organized, stored. This takes time, money and effort.
    2. Who has the ability, motivation and resources to create the kind of data you are looking for?
      1. A person? Individual researcher?
      2. A research organization? (university, national lab, institute?)
      3. Government bodies?
      4. International organizations?
      5. Companies?
  4. Search for your data set.
  5. Evaluate your data set.

---------------------------------------------------------------------------------------------

Here is a link to the slides for the presentation about finding and evaluating data sets at the January 2023 Graduate Student Data Bootcamp

What is data?

Data consists of discrete values or units of information that can take many forms: numbers, words, characters, images, sound recordings, videos, among others. Data is anything that can be collected, stored, organized, and analyzed.

A data set is information that is collected, assembled, and organized–by someone–for analysis of an issue, phenomenon, or subject. It may contain many kinds of information: textual, numeric, images, sound, video, code, geospatial data in a variety of formats (CSV, XML, TIFF, PDF, etc.).

From Axiom Data Science:

Containers: TAR, GZIP, ZIP

Databases: CSV, XML

Tabular data: CSV

Geospatial vector data: SHP, GeoJSON, KML, DBF, NetCDF

Geospatial raster data: GeoTIFF/TIFF, NetCDF, HDF-EOS

Moving images: MOV, MPEG, AVI, MXF

Sounds: WAVE, AIFF, MP3, MXF

Statistics: ASCII, DTA, POR, SAS, SAV

Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP

Text: XML, PDF/A, HTML, ASCII, UTF-8

Web archive: WARC

When you create a data set, you want to make sure that it is "good" data in that it is accurate, complete, organized, and ultimately, reusable.

Summer 2023: Major System Updates

You'll notice some changes as we work.
  • New Library Catalog: try it out and tell us what you think
  • Prospector will be unavailable from May 19 through Spring 2024. 

Full details on our library services platform migration project & timeline.