Overview

Dataset Statistics

Number of Variables 12
Number of Rows 891
Missing Cells 866
Missing Cells (%) 8.1%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 315.7 KB
Average Row Size in Memory 362.8 B
Variable Types
  • Numerical: 3
  • Categorical: 9

Dataset Insights

PassengerId is uniformly distributed Uniform
Age has 177 (19.87%) missing values Missing
Cabin has 687 (77.1%) missing values Missing
Fare is skewed Skewed
Name has a high cardinality: 891 distinct values High Cardinality
Ticket has a high cardinality: 681 distinct values High Cardinality
Cabin has a high cardinality: 147 distinct values High Cardinality
Survived has constant length 1 Constant Length
Pclass has constant length 1 Constant Length
SibSp has constant length 1 Constant Length
Parch has constant length 1 Constant Length
Embarked has constant length 1 Constant Length
Name has all distinct values Unique
  • 1
  • 2

Variables


PassengerId

numerical

Approximate Distinct Count 891
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 14256
Mean 446
Minimum 1
Maximum 891
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PassengerId is uniformly distributed

Quantile Statistics

Minimum 1
5-th Percentile 45.5
Q1 223.5
Median 446
Q3 668.5
95-th Percentile 846.5
Maximum 891
Range 890
IQR 445

Descriptive Statistics

Mean 446
Standard Deviation 257.3538
Variance 66231
Sum 397386
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.577
  • PassengerId is not normally distributed (p-value 7.259388077973426e-05)

Survived

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 58806
  • The largest value (0) is over 1.61 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 1
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 891
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.61 times larger than the second largest value (1)
  • Survived has words of constant length

Pclass

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 58806
  • The largest value (3) is over 2.27 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 3
2nd row 1
3rd row 3
4th row 1
5th row 3

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 891
  • The top 2 categories (3, 1) take over 50.0%
  • The largest value (3) is over 2.27 times larger than the second largest value (1)
  • Pclass has words of constant length

Name

categorical

Approximate Distinct Count 891
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 81941

Length

Mean 26.9652
Standard Deviation 9.2816
Median 25
Minimum 12
Maximum 82

Sample

1st row Braund, Mr. Owen H...
2nd row Cumings, Mrs. John...
3rd row Heikkinen, Miss. L...
4th row Futrelle, Mrs. Jac...
5th row Allen, Mr. William...

Letter

Count 19091
Lowercase Letter 15446
Space Separator 2735
Uppercase Letter 3645
Dash Punctuation 13
Decimal Number 0
  • The largest value (mr) is over 2.86 times larger than the second largest value (miss)

Sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 62107
  • The largest value (male) is over 1.84 times larger than the second largest value (female)

Length

Mean 4.7048
Standard Deviation 0.956
Median 4
Minimum 4
Maximum 6

Sample

1st row male
2nd row female
3rd row female
4th row female
5th row male

Letter

Count 4192
Lowercase Letter 4192
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (male, female) take over 50.0%
  • The largest value (male) is over 1.84 times larger than the second largest value (female)

Age

numerical

Approximate Distinct Count 88
Approximate Unique (%) 12.3%
Missing 177
Missing (%) 19.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 11424
Mean 29.6991
Minimum 0.42
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age is skewed right (γ1 = 0.3883)

Quantile Statistics

Minimum 0.42
5-th Percentile 4
Q1 20.125
Median 28
Q3 38
95-th Percentile 56
Maximum 80
Range 79.58
IQR 17.875

Descriptive Statistics

Mean 29.6991
Standard Deviation 14.5265
Variance 211.0191
Sum 21205.17
Skewness 0.3883
Kurtosis 0.1686
Coefficient of Variation 0.4891
  • Age has 11 outliers

SibSp

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory Size 58806
  • The largest value (0) is over 2.91 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 891
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 2.91 times larger than the second largest value (1)
  • SibSp has words of constant length

Parch

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory Size 58806
  • The largest value (0) is over 5.75 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 891
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 5.75 times larger than the second largest value (1)
  • Parch has words of constant length

Ticket

categorical

Approximate Distinct Count 681
Approximate Unique (%) 76.4%
Missing 0
Missing (%) 0.0%
Memory Size 63930

Length

Mean 6.7508
Standard Deviation 2.7455
Median 6
Minimum 3
Maximum 18

Sample

1st row A/5 21171
2nd row PC 17599
3rd row STON/O2. 3101282
4th row 113803
5th row 373450

Letter

Count 673
Lowercase Letter 21
Space Separator 239
Uppercase Letter 652
Dash Punctuation 0
Decimal Number 4808

Fare

numerical

Approximate Distinct Count 248
Approximate Unique (%) 27.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 14256
Mean 32.2042
Minimum 0
Maximum 512.3292
Zeros 15
Zeros (%) 1.7%
Negatives 0
Negatives (%) 0.0%
  • Fare is skewed right (γ1 = 4.7793)

Quantile Statistics

Minimum 0
5-th Percentile 7.225
Q1 7.9104
Median 14.4542
Q3 31
95-th Percentile 112.0791
Maximum 512.3292
Range 512.3292
IQR 23.0896

Descriptive Statistics

Mean 32.2042
Standard Deviation 49.6934
Variance 2469.4368
Sum 28693.9493
Skewness 4.7793
Kurtosis 33.2043
Coefficient of Variation 1.5431
  • Fare is not normally distributed (p-value 5.925743764895219e-18)
  • Fare has 116 outliers

Cabin

categorical

Approximate Distinct Count 147
Approximate Unique (%) 72.1%
Missing 687
Missing (%) 77.1%
Memory Size 13992

Length

Mean 3.5882
Standard Deviation 2.0743
Median 3
Minimum 1
Maximum 15

Sample

1st row C85
2nd row C123
3rd row E46
4th row G6
5th row C103

Letter

Count 238
Lowercase Letter 0
Space Separator 34
Uppercase Letter 238
Dash Punctuation 0
Decimal Number 460

Embarked

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.3%
Missing 2
Missing (%) 0.2%
Memory Size 58674
  • The largest value (S) is over 3.83 times larger than the second largest value (C)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row S
2nd row C
3rd row S
4th row S
5th row S

Letter

Count 889
Lowercase Letter 0
Space Separator 0
Uppercase Letter 889
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (S, C) take over 50.0%
  • The largest value (s) is over 3.83 times larger than the second largest value (c)
  • Embarked has words of constant length

Interactions

Correlations

Missing Values