Titanic Example
Using data science to prevent Titanic disasters
Want to know how we crunch the data and which questions we help you answer using the power of data science?
Cast your mind back to the Titanic. Imagine you have been tasked with investigating the disaster that befell the world’s most famous ship when it hit an iceberg just before midnight on 14 April 1912. Your main focus is investigating survival probabilities and learning from this disaster so that future fatalities can be prevented. (Think of it as an old version of a modern air crash investigation.)
As an expert in machine learning you've requested two files from your manager. The first file is the training file. This is the file you will use to train your data models. Ideally the training file should contain between 66% and 80% of the available data. The more test data you have, the more accurate your model will be. Below is your training file, based on Wikipedia data and the passenger manifesto.
Training File
Id | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 3 | Braund... | male | 22 | 1 | 0 | A/5 21171 | 7.25 | S | |
2 | 1 | 1 | Cumings... | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
3 | 1 | 3 | Heikkinen... | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.925 | S | |
4 | 1 | 1 | Futrelle... | female | 35 | 1 | 0 | 113803 | 53.1 | C123 | S |
... up to 891 records. |
Test File
You also receive a test file. This file is used to test the accuracy of your model. Think of it as equivalent to a year-end exam. The results of your test file are like the final exam score you achieve on a specific academic module that you pursued for one academic year at university.
Id | Survived (To be predicted) |
Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
---|---|---|---|---|---|---|---|---|---|---|---|
892 | 0 | 3 | Kelly... | male | 34.5 | 0 | 0 | 330911 | 7.8292 | Q | |
893 | 1 | 3 | Wilkes... | female | 47 | 1 | 0 | 363272 | 7 | S | |
894 | 0 | 2 | Myles... | male | 62 | 0 | 0 | 240276 | 9.6875 | Q | |
895 | 0 | 3 | Wirz... | male | 27 | 0 | 0 | 315154 | 8.6625 | S | |
... up to 418 records. |
You register and log in on the 48 hours website. On the Dashboard you use the Dashboard->Add Job link to add the Training File and the Test File to 48hours.ai.
The 48 hours engineers use your data to run different models and to ascertain the best model fit for your data.
Within 48 hours
our data engineers create the following report:
[DOWNLOAD REPORT PDF]
Your manager is very happy with the report and its insights. Based on your report the following rectifications are proposed for future shipping journeys.
- Passengers will not be able to travel without their families on intercontinental routes.
- In the future there will be more lifeboats with enough space to accommodate all passengers on ships.