A Guide to the Web for Statisticians
Data Sets
Ready for Teaching
- Data and Story Library.
DASL (pronounced
"dazzle") is an online library of datafiles and stories that
illustrate the use
of basic statistics methods. Stories are classified according to
statistical methods and
major topics of interest. Well organized. Perhaps the best single
source of data sets for
teaching. DASL Project, Cornell University.
- Datasets for
Learning
Biostatistical Modeling Techniques. Headed by the survival on the
Titanic data set.
Frank Harrell, University of Virginia.
- JSE
Data Archive.
Journal of Statistics Education archive of data sets for teaching.
- Electronic
Dataset Service. Data
sets classified by statistical methodology. Trina Hosmer, University
of Massachusetts.
- Statistics UCLA. Jan de
Leeuw, University of
California, Los Angeles.
- Case Studies.
Several dozen case studies
with questions and hints.
- Data Sets. Select
data by subject area,
data from textbooks and consulting projects.
- Time Series
Data
Library. Over 500 data sets. Robert Hyndman, Monash University.
- Xtreme
Database.
Currently contains 18 datasets. Michael Thomas,
Universitaet-Gesamthochschule, Siegen.
- WWS509
Generalized
linear models. Data sets for a graduate course at Princeton
University. Germán
Rodríguez, Princeton University.
TextBooks
Larger or Less Processed
- Data: a
Collection of Problems from
many Fields for the Student and Research Worker by D.F. Andrews
and A.M. Herzberg.
Data from the book. Some data sets are classics. Many others do not
yield to standard
analyses. Statlib. Also available by ftp from the University of
Toronto
or UCLA
, including
the whole collection as a compressed tar file.
- Dr B's
Wide World
of Web Data. Links to hundreds of data sets, organized by subject
matter. John
Behrens, Arizona State University.
- Graphics
Data Expositions. Data for the bi-annual data expositions of the
Statistical Graphics
Section of the ASA.
- Journal of Applied
Econometrics Data Archive.
Data from JAE articles accepted after January 1994. Queen's
University.
- Multilevel
Data Sets.
For teaching and training in multilevel model methods. University of
Montreal.
- Peter J Diggle Data
Sets.
Geostatistical and Spatial point pattern data sets. Peter Diggle,
University of Lancaster.
- SPSS Data Sets.
Data sets for SPSS
and SYSTAT, and a selection of other public data sets. SPSS Inc.
- Statistical Reference
Datasets. The
purpose of this project is to improve the accuracy of statistical
software by providing
reference datasets with certified computational results that enable
the objective
evaluation of statistical software. NIST.
- Statlib
- Datasets. Main
StatLib data archive.
- Breakfast
Cereal Data. From
the 1993 Graphics Expo.
- Case Studies in
Biometry. Data
diskette for the book by Nicholas Lange, Louise Ryan, Lynne
Billard, David Brillinger,
Loveday Conquest, Joel Greenhouse. Wiley, 1994.
- Data
Expositions. Data sets used for
the annual ASA Statistical Graphics and Computing Data
Expositions.
- Disease Data.
From the 1991 Statistics in
Public Health Surveillance Exposition.
- JASA Data.
Contributed datasets from
articles published in the Journal of the American Statistical
Association.
- King Crab Data. A
large but patchy data set.
Although the topic is in principle an interesting one, my students
have had trouble
assembling any useful data set from the various files associated
with this project. 1990
Data Expo.
- Oscillator
Time Series. From
the 1993 Graphics Expo.
- UCI
Machine Learning
Repository. Over 100 datasets from large to small. Christopher
Merz, University of
California, Irvine.
- University of
Wisconsin Data Archive.
Data sets from masters exams and several books, including Box, Hunter
& Hunter;
Devore; Milliken & Johnson's Analysis of Messy Data;
Yandell's Practical
Data Analysis for Designed Experiments. Douglas Bates,
University of Wisconsin.
- Workshop on
Smoothing Applications,
UBC June 1999: Data Sets. A collection of substantial, interesting
and well documented
data sets to be used by speakers at the workshop in 1999. Nancy
Heckman, University of
British Columbia.
Sources of Raw Data
- Britannica Online. Search the
Encyclopedia Britannica.
7 days free trial, then $US12.50 per month. Includes for example
results in every event in
every modern Olympic Games.
- Council of European
Social Science Data
Archives. Provides a clickable map of social science data archives
all over the world,
and an integrated data catalogue for social science data archives.
- Documents
Center.
A excellent index to government statistical data on the Web, both
United States and
international, maintained by the Documents Center of the University of
Michigan.
- Data Zoo.
California coastal data
collection programs. Organized by experiment, instrument type and
geographical region.
Center for Coastal Studies, University of California, San Diego.
- Economic and Social Research
Council Data Archive.
Largest collection of accessible computer-readable data in the social
sciences and
humanities in the UK. University of Essex.
- LDEO Climate Data
Catalog. Earth science
data, primarily oceanographic and atmospheric datasets. University of
Columbia.
- NZ Social Science
Research Data and
Information Services Centre. Contains 33 social science datasets.
Approval needs to be
signed to access many of them.
- Physical
Geography Resources. A
mixture of images, datasets, and data libraries of potential use to
physical geographers.
James E Burt, University of Wisconsin-Madison.
- Project Gutenberg. Full text
online for a huge number
of books, including such things as the World Factbook. Major public
domain books or
classics for which copyright has expired are likely to be here.
- Track and Field
Statistics. Current
records, full results of finals from olympic games, and much more.
Mika Perkiömäki,
University of Tampere.
- US Census Reference. The
complete US Census on
one CD. GeoLytics Inc.
- VIMS Pier Ambient
Monitoring Data.
Local conditions on the York River at Gloucester Point, VA. You can
download water
parameters and meteorological variables measured at 6 minutes
intervals for the past 10
days, or view graphs of the same variables for the current and past
years. Virginia
Institute of Marine Science, College of William & Mary, Gloucester
Point, VA.
Other Lists of Links
Home | Catalogues
| Web Surfing | Universities
| Departments
Societies | Conferences
| People | Journals
| Books
Employment | Teaching
| Data Sets | Methodology
Statistical Computing | General Computing | Mailing
Lists | News Groups
Telephone Directories | Airlines | Travel
Information | Government
Author:
Calvin L. Williams,
Mathematical Sciences-Clemson University,
Clemson University
Last updated:
August 18, 1999
Send Comments to :
calvinw@math.clemson.edu