Titanic Example

Using data science to prevent Titanic disasters

Want to know how we crunch the data and which questions we help you answer using the power of data science?

Cast your mind back to the Titanic. Imagine you have been tasked with investigating the disaster that befell the world’s most famous ship when it hit an iceberg just before midnight on 14 April 1912. Your main focus is investigating survival probabilities and learning from this disaster so that future fatalities can be prevented. (Think of it as an old version of a modern air crash investigation.)

Titanic (source: Wikipedia)

As an expert in machine learning you've requested two files from your manager. The first file is the training file. This is the file you will use to train your data models. Ideally the training file should contain between 66% and 80% of the available data. The more test data you have, the more accurate your model will be. Below is your training file, based on Wikipedia data and the passenger manifesto.

Training File

Id Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund... male 22 1 0 A/5 21171 7.25
2 1 1 Cumings... female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen... female 26 0 0 STON/O2. 3101282 7.925
4 1 1 Futrelle... female 35 1 0 113803 53.1 C123 S
... up to 891 records.

Test File

You also receive a test file. This file is used to test the accuracy of your model. Think of it as equivalent to a year-end exam. The results of your test file are like the final exam score you achieve on a specific academic module that you pursued for one academic year at university.

Id Survived
(To be predicted)
Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
892 0 3 Kelly... male 34.5 0 0 330911 7.8292
893 1 3 Wilkes... female 47 1 0 363272 7 S
894 0 2 Myles... male 62 0 0 240276 9.6875 Q
895 0 3 Wirz... male 27 0 0 315154 8.6625 S
... up to 418 records.

You register and log in on the 48 hours website. On the Dashboard you use the Dashboard->Add Job link to add the Training File and the Test File to 48hours.ai.

The 48 hours engineers use your data to run different models and to ascertain the best model fit for your data.

Within 48 hours our data engineers create the following report:
Titanic Report

Your manager is very happy with the report and its insights. Based on your report the following rectifications are proposed for future shipping journeys.

  • Passengers will not be able to travel without their families on intercontinental routes.
  • In the future there will be more lifeboats with enough space to accommodate all passengers on ships.