---------------------------------------------------------------------------------------------
Here is a link to the slides for the presentation about finding and evaluating data sets at the January 2023 Graduate Student Data Bootcamp
What is data?
Data consists of discrete values or units of information that can take many forms: numbers, words, characters, images, sound recordings, videos, among others. Data is anything that can be collected, stored, organized, and analyzed.
A data set is information that is collected, assembled, and organized–by someone–for analysis of an issue, phenomenon, or subject. It may contain many kinds of information: textual, numeric, images, sound, video, code, geospatial data in a variety of formats (CSV, XML, TIFF, PDF, etc.).
From Axiom Data Science:
Containers: TAR, GZIP, ZIP
Databases: CSV, XML
Tabular data: CSV
Geospatial vector data: SHP, GeoJSON, KML, DBF, NetCDF
Geospatial raster data: GeoTIFF/TIFF, NetCDF, HDF-EOS
Moving images: MOV, MPEG, AVI, MXF
Sounds: WAVE, AIFF, MP3, MXF
Statistics: ASCII, DTA, POR, SAS, SAV
Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
Text: XML, PDF/A, HTML, ASCII, UTF-8
Web archive: WARC
When you create a data set, you want to make sure that it is "good" data in that it is accurate, complete, organized, and ultimately, reusable.
Full details on our library services platform migration project & timeline.