Next, we'll see if we can tackle a problem that may have real-world implications for deep learning. While I jest about the awesome power of neural networks for classifying flowers, there are some cool ways to show that the long arm of deep learning extends beyond the casual gardener's domain.
Specifically, I'd like to show that deep learning can be used in a healthcare setting. There are plenty of ways that I might be able to show this, but perhaps the message would be most effective if I could tackle something relevant. So, we'll be tackling a prediction task related to heart disease, which, according to this image, is rather important:
Heart disease affects most people whether it's directly or through the affliction of a loved one. It is the cause of death for 25% of Americans and, if your household is as pork-chop-laden as mine, probably has affected you as well.
So, what then can we do to show that deep learning is relevant to heart disease? Well, it turns out that there is a set of patient data publicly available that will allow us to build a classifier: (http://archive.ics.uci.edu/ml/datasets/heart+Disease).
According to this website, the dataset contains 75 attributes (though they only use 14 attributes) and contains 303 samples. This isn't an extremely large dataset, but we'll see if we can come up with a neural network to use this data. Note that extremely complex classification tasks, such as image recognition, requires an enormity of data that would surpass the volume given in this dataset.
So let's go ahead and see what kind of attributes we're working with.
The website states that the following features are given for each case:
1. Age: age of the patient in years.
2. Sex: 1 if male; 0 if female.
3. Cp: chest pain type (1. typical angina, 2. atypical angina, 3. non-anginal pain, 4. asymptomatic)
4. Trestbps: resting bp in mmHg
5. Chol: serum cholesterol in mg/dl
6. fbs: bit if fasting blood sugar > 120mg/dl (1 = true, 0 = false)
7. restecg: resting electrocardiographic results (0: normal, 1: ST-T wave abnormality)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1: yes, 0: no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment (1: upsloping, 2: flat, 3: downsloping)
12. ca: number of major vessels (0-3) colored by flouroscopy
13. thal: (3: normal, 6: fixed defect, 7: reversible defect)
14. num (predicted attribute): diagnosis of heart disease (0: no, 1: yes)
So it looks like we're given some demographic information for each patient (i.e. the age and sex of the patient). We're also given some acute information about a patient, such as whether there's angina (chest pain), what the resting blood pressure is, cholesterol levels, etc. All these pieces of information are important in determining the likelihood that a patient develops heart failure. In addition to these measurements, we also have a boolean value that describes whether the patient has heart failure. That's good.
Given these measurements, we can conceive of a function that we might approximate with a neural network. We're given these measurements and whether the patient develops heart failure, so we can see that we can use those given measurements to build a classifier that predicts the onset of heart failure. For this network, the measurements and demographic data would be used as input to the network, and the presence of heart failure would be used as an output.
That's a good problem definition. Let's continue this problem by pre-processing the data in R in the next post.
Specifically, I'd like to show that deep learning can be used in a healthcare setting. There are plenty of ways that I might be able to show this, but perhaps the message would be most effective if I could tackle something relevant. So, we'll be tackling a prediction task related to heart disease, which, according to this image, is rather important:
Heart disease affects most people whether it's directly or through the affliction of a loved one. It is the cause of death for 25% of Americans and, if your household is as pork-chop-laden as mine, probably has affected you as well.
So, what then can we do to show that deep learning is relevant to heart disease? Well, it turns out that there is a set of patient data publicly available that will allow us to build a classifier: (http://archive.ics.uci.edu/ml/datasets/heart+Disease).
According to this website, the dataset contains 75 attributes (though they only use 14 attributes) and contains 303 samples. This isn't an extremely large dataset, but we'll see if we can come up with a neural network to use this data. Note that extremely complex classification tasks, such as image recognition, requires an enormity of data that would surpass the volume given in this dataset.
So let's go ahead and see what kind of attributes we're working with.
The website states that the following features are given for each case:
1. Age: age of the patient in years.
2. Sex: 1 if male; 0 if female.
3. Cp: chest pain type (1. typical angina, 2. atypical angina, 3. non-anginal pain, 4. asymptomatic)
4. Trestbps: resting bp in mmHg
5. Chol: serum cholesterol in mg/dl
6. fbs: bit if fasting blood sugar > 120mg/dl (1 = true, 0 = false)
7. restecg: resting electrocardiographic results (0: normal, 1: ST-T wave abnormality)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1: yes, 0: no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment (1: upsloping, 2: flat, 3: downsloping)
12. ca: number of major vessels (0-3) colored by flouroscopy
13. thal: (3: normal, 6: fixed defect, 7: reversible defect)
14. num (predicted attribute): diagnosis of heart disease (0: no, 1: yes)
So it looks like we're given some demographic information for each patient (i.e. the age and sex of the patient). We're also given some acute information about a patient, such as whether there's angina (chest pain), what the resting blood pressure is, cholesterol levels, etc. All these pieces of information are important in determining the likelihood that a patient develops heart failure. In addition to these measurements, we also have a boolean value that describes whether the patient has heart failure. That's good.
Given these measurements, we can conceive of a function that we might approximate with a neural network. We're given these measurements and whether the patient develops heart failure, so we can see that we can use those given measurements to build a classifier that predicts the onset of heart failure. For this network, the measurements and demographic data would be used as input to the network, and the presence of heart failure would be used as an output.
That's a good problem definition. Let's continue this problem by pre-processing the data in R in the next post.
Comments
Post a Comment