The Santa Fe Institute Time Series Prediction and Analysis Competition => Description of the Competition Data <= This file contains descriptions of the data sets that were used in the competition. For each data set we give (1) the original description from the competition instruction file, (2) the full information about the data set, (3) an explanation of why it was chosen, and (4) a description of the format of the continuation file. Two of the data sets, B and E, were fully described in the instruction file and are not used for a prediction task and so items (2) and (4) will be skipped for them. The continuation files for sets A, C, and D have been placed in the competition_data directory with a suffix of .cont appended (such as A.cont). If you do not have ftp access, send email to tserver@sfi.santafe.edu with a subject of "send time series continuations". We have tried, with 5 data sets, to provide data that cover as wide a range of realistic time series problems as possible. In order to do this, each data set was selected to have a number of features of interest. Although this has necessarily entailed some compromises, we believe that these data represent the core of the time series analysis problems that arise in many disciplines. Set A --- - (...) Set B --- - => Description from instruction file: b) B1.dat and B2.dat (34,000 points) -- ----------------- This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital in Boston, Massachusetts (data submitted by David Rigney and Ary Goldberger). The file has been split into two sequential parts, B1.dat and B2.dat; the lines in the files are spaced by 0.5 seconds. The first column is the heart rate, the second is the chest volume (respiration force), and the third is the blood oxygen concentration (measured by ear oximetry). The heart rate was determined by measuring the time between the QRS complexes in the electrocardiogram, taking the inverse, and then converting this to an evenly sampled record by interpolation. There were no premature beats - sudden changes in the heart rate are not artifacts. The respiration and blood oxygen data are given in uncalibrated A/D bits; these two sensors slowly drift with time (and are therefore occasionally rescaled by a technician) and can be detached by the motion of the patient, hence their calibration is not constant over the data set. They were converted from 250 Hz to 2 Hz data by averaging over a 0.08 second window at the times of the heart rate samples. Between roughly 4 hours 30 minutes and 4 hours 34 minutes from the start of the file the sensors were disconnected. The following table gives the times and stages of sleep, as determined by a neurologist looking at the EEG (W = awake, 1 and 2 = waking/sleep stages, R = REM sleep): 2:00: W, 2:30: 1, 3:30: W, 9:30: 1, 10:00: W, 11:00: 1, 12:00: W,15:30:1, 16:00: 2, 36:30: 1, 38:30: W, 39:30: 1, 42:30: 2, 44:00: 1, 44:30: 2, 45:00: W, 46:00: 1, 47:00: W, 47:30: 2, 48:30: 1, 50:00: 2, 50:30: 1, 51:00: 2, 51:30: 1, 52:00: 2, 52:30: W, 53:00: 1, 53:30: W, 55:00: 1, 56:00: 2, 1:21:30: W, 1:22:30: 1, 1:25:00: W, 1:30:00: 1, 1:30:30: W, 1:31:00: 1, 1:31:30: W, 1:34:00: 1, 1:35:00: W, 1:38:30: 1, 1:39:00: W, 1:40:00: 1, 1:40:30: W, 1:42:00: 1, 1:42:30: 2, 1:44:00: 1, 1:50:30: 2, 2:04:30: R, 2:21:00: W, 2:22:00: 1, 2:22:30: W, 2:25:00: 1, 2:43:30: W, 2:47:30: 1, 2:48:30: W, 2:50:00: 1, 2:57:30: W, 2:58:30: 1, 2:59:00: W, 3:00:00: 1, 3:00:30: W, 3:01:00: 1, 3:05:00: W, 3:17:30: 1, 3:18:00: 2, 3:21:00: W, 3:21:30: 1, 3:22:00: W, 3:43:00: 1, 4:11:00: W, 4:11:30: 1, 4:12:00: W, 4:25:00: 1, 4:27:00: W, 4:27:30: 1, 4:28:00: W, 4:43:30: 1, 4:44:00: 2, 4:44:30: 1, 4:45:00: 2, 4:47:00: 1, 4:47:30: 2, 4:48:30: 1, 4:49:00: 2, 4:49:30: 1, 4:50:00: 2, 4:52:00: 1, 4:52:30: 2, 4:54:00: 1, 4:54:30: 2, 4:57:30: 1, 4:58:00: 2 This patient shows sleep apnea (periods during which he takes a few quick breaths and then stops breathing for up to 45 seconds). Sleep apnea is medically important because it leads to sleep deprivation and occasionally death. There are three primary research questions associated with this data set: 1) Can part of the temporal variation in the heart rate be explained by a low-dimensional mechanism, or is it due to noise or external inputs? 2) How do the evolution of the heart rate, the respiration rate, and the blood oxygen concentration affect each other? (a correlation between breathing and the heart rate, called sinus arrhythmia, is almost always observed). 3) Can the episodes of sleep apnea (stoppage of breathing) be predicted from the preceding data? => Reasons for choice: a) Heart rate variability There is growing (but still controversial) evidence that the observed variations in the heart rate might be related to a low-dimensional governing mechanism; understanding this mechanism is obviously very important in order to understand its failures (ie, heart attacks). b) Multi-dimensional data sets These data provide simultaneous measurements of a number of potentially interacting variables; it is an open question how best to use the extra information to learn about how the variables interact. Most importantly, there is interest in verifying and understanding the coupling between respiration and the heart rate. c) Non-stationary data These data were recorded with as much care as is possible, but the experimental system (the sleeping patient) is obviously non-stationary. A successful analysis of these data must attempt to distinguish the presumed internal dynamics from changes in the patient's state. (...)