Skip to Main Content

Finding and Evaluating Data: Finding Datasets

Places to Find Data Sets

Steps:

  1. Define the kind of data do you need
    1. Consider: format, content, dataset size, available software/tools
    2. Note: not all data exists or is available
  2. Background research:
    1. Look at what has already been done on topic
    2. Look at datasets used previously
    3. Consider methodology used previously
  3. Identify possible sources
    1. Data must be created: recorded, organized, stored. This takes time, money and effort.
    2. Who has the ability, motivation and resources to create the kind of data you are looking for?
      1. A person? Individual researcher?
      2. A research organization? (university, national lab, institute?)
      3. Government bodies?
      4. International organizations?
      5. Companies?
  4. Search for your data set.
  5. Evaluate your data set.

What are APIs?

Application Program Interfaces (APIs) allow two software components to communicate with each other; APIs allow the transfer of data between two systems (which can be things like apps or computers).

Using APIs allows businesses to connect the applications they use, ensuring that information can be shared quickly and easily. For example, APIs allow companies to use payment services (like credit cards).

API architecture can be understood in terms of "client" and "server." The application sending the request is called the client (so with the payment example, the payment company), and the application sending the response is called the server (the company producing the product or service).

"Metadata" are often described as "data about data." In the context of a data set, metadata provides information about the content of a data set, how it was created, and how it can be used. Metadata is essential for finding, understanding, and the correct re-use of a data set.

Metadata standards are rules for how metadata is recorded and structured. They are developed by communities to help regulate the creation of data sets so that the needed information about any data set is easy to find and understand. This allows the sharing and reuse of data and the sharing of new information.