**Narration**
1
00:00:01 --> 00:00:06
Hello Friends. Welcome to the tutorial on "**Statistics**” using **Python**
2
00:00:07 --> 00:00:13
At the end of this tutorial, you will be able to - Do **statistical** operations in **Python**
3
00:00:14 --> 00:00:21
**Sum** a set of numbers and Find their **mean, median** and **standard deviation**
4
00:00:22 --> 00:00:28
To record this tutorial, I am using **Ubuntu Linux 16.04** operating system
5
00:00:29 --> 00:00:35
**Python 3.4.3 **and **IPython 5.1.0**
6
00:00:36 --> 00:00:41
To practise this tutorial, you should know how to load data from files
7
00:00:42 --> 00:00:46
use **Lists** and access parts of **Arrays**
8
00:00:47 --> 00:00:52
If not, see the pre-requisite **Python** tutorials on this website.
9
00:00:53 --> 00:01:02
For this tutorial, we will use the data file **student_record.txt **which we used in the earlier tutorial.
10
00:01:03 --> 00:01:07
You can also find this file in the **Code Files** link of this tutorial.
11
00:01:08 --> 00:01:11
Please download it in **Home directory** and use it.
12
00:01:12 --> 00:01:21
We will use mathematical and logical operations on this **array structured file**.
For this, we need to install **Numpy**.

13
00:01:22 --> 00:01:25
**NumPy**, stands for **Numerical Python.**
14
00:01:26 --> 00:01:32
It is a library consisting of **pre-compiled functions** for mathematical and numerical routines.
15
00:01:33 --> 00:01:36
**NumPy** has to be installed separately.
16
00:01:37 --> 00:01:44
Let us first open the **Terminal **by pressing **Ctrl+Alt+T **keys simultaneously.
17
00:01:45 --> 00:01:52
Let us install latest **pip**.
**pip** command is used to install** python libraries.**

18
00:01:53 --> 00:02:02
Type, **sudo apt-get install python3 hyphen pip** and press **Enter**.
19
00:02:03 --> 00:02:14
You need to have **root** access for installation as it asks for **admin password**.
20
00:02:15 --> 00:02:23
Next, we need to install **numpy library** as we will be using **numpy library** throughout the tutorial.
21
00:02:24 --> 00:02:37
Type, **sudo pip3 install numpy **is equal to is equal to** 1.13.3 **and press** Enter.**
22
00:02:38 --> 00:02:46
The installation is completed successfully. We can see the **terminal prompt** without any error.
23
00:02:47 --> 00:02:51
Next we will learn about **loadtxt() function.**
24
00:02:52 --> 00:02:57
To get the data as an **array**, we use the **loadtxt() function.**
25
00:02:58 --> 00:03:03
For **loadtxt() function**, we need to **import numpy library** first.
26
00:03:04 --> 00:03:11
Switch back to the **terminal**.Now, type **ipython3** and press **Enter**.
27
00:03:12 --> 00:03:23
Type **import numpy as np** and press **Enter**.
Where **np** is alias to **numpy** and it can be any name.

28
00:03:24 --> 00:03:31
Let us load the data from the file **student_record.txt **as an **array**.
29
00:03:32 --> 00:04:03
Type, **L** is equal to **np dot loadtxt** inside **parentheses** inside quotes **student_record.txt** comma **usecols** is equal to inside **parentheses** 3 comma 4 comma 5 comma 6 comma 7 comma **delimiter** is equal to inside quotes **semicolon** and Press **Enter**.
30
00:04:04 --> 00:04:06
Type** L **and press** Enter**.
31
00:04:07 --> 00:04:10
We get the **output** in the form of an **array**.
32
00:04:11 --> 00:04:15
**loadtxt** loads data from an external file.
33
00:04:16 --> 00:04:26
**Delimiter** specifies the kind of character that the **fields** of data is separated by.

**usecols** specifies the **columns** to be used.

34
00:04:27 --> 00:04:32
**loadtxt, delimiter** and **usecols** are **keywords**.
35
00:04:33 --> 00:04:41
So **columns** 3,4,5,6,7 from **student_record.txt **are loaded here.
36
00:04:42 --> 00:04:48
The 'comma' between **column numbers** is added because **usecols** is a **sequence**.
37
00:04:49 --> 00:04:57
As we can see **L** is an **array**. We can get the shape of this **array** using **shape.**
38
00:04:58 --> 00:05:03
Type, **L dot shape **and press **Enter**.
39
00:05:04 --> 00:05:10
We get a **tuple** giving the numbers of **rows** and **columns** respectively.
40
00:05:11 --> 00:05:21
In this example, the array **L **has one lakh eighty five thousand six hundred and sixty seven rows and 5 columns.
41
00:05:22 --> 00:05:27
Let us switch back to the **student_record.txt** file.
42
00:05:28 --> 00:05:38
Let us start applying statistical operations on these.
How do you find the sum of marks of all subjects for the first student?

43
00:05:39 --> 00:05:53
Switch back to the **terminal**.
To access the first row in an **array**, we will type **L **inside square brackets **0 **and press **Enter**.

44
00:05:54 --> 00:06:08
Now to sum this, type, **totalmarks **is equal to **sum **inside parentheses **L **inside square brackets **0 ** and Press **Enter.**
45
00:06:09 --> 00:06:18
Type **totalmarks **and press **Enter.**
We got sum of marks of all subjects of the first student.

46
00:06:19 --> 00:06:25
Now to get the **mean** we can divide the **totalmarks** by the length of the **array.**
47
00:06:26 --> 00:06:39
Type, **totalmarks **divided by **len** inside parentheses **L** inside square brackets **0 **and press **Enter.**
48
00:06:40 --> 00:06:54
Or simply use the **function mean**. Type **np dot mean** inside parentheses **L **inside square brackets **0 **and press** Enter.**
49
00:06:55 --> 00:07:03
But we have such a large **data** **set**.
And calculating the **mean** for each student one by one is time consuming.

50
00:07:04 --> 00:07:11
Is there a way to reduce the work?
For this, we will look into the **documentation** of **mean.**

51
00:07:12 --> 00:07:22
Type, **np dot mean questionmark **and press Enter*.* Read the text for more information.
52
00:07:23 --> 00:07:27
Type **q **to exit the documentation.
53
00:07:28 --> 00:07:34
In the above example, **L** is a **two dimensional array **like **matrix**.
54
00:07:35 --> 00:07:40
We can calculate the **mean** across each of the **axis** of the **array**.
55
00:07:41 --> 00:07:47
The **axis** of **rows** is referred by 0 and **columns** by 1.
56
00:07:48 --> 00:07:56
To calculate **mean** across all **columns**, we have to pass extra parameter 1 for the **axis**.
57
00:07:57 --> 00:07:59
Switch back to the **terminal**.
58
00:08:00 --> 00:08:06
Let us calculate, **mean** of the marks scored by all the students for each subject.
59
00:08:07 --> 00:08:17
Type **np dot mean **inside parentheses **L comma 0** *and press ***Enter**.
60
00:08:18 --> 00:08:24
Next, we will calculate the **median** of English marks for all the students.
61
00:08:25 --> 00:08:34
Type **L **inside square brackets **colon comma 0 **and press **Enter**.
62
00:08:35 --> 00:08:44
Note **colon comma zero** displays first **column** in the **array** that is, English Mark.
63
00:08:45 --> 00:08:50
To get the **median** we will simply use the **function median**.
64
00:08:51 --> 00:09:03
Type **np dot median **inside parentheses **L **inside square brackets **colon** comma **0 **
Press **Enter**.

65
00:09:04 --> 00:09:12
For all the subjects, we can calculate **median** across all **rows** using **median function** as shown here.
66
00:09:13 --> 00:09:23
Type **np dot median **inside parentheses **L comma 0**
Press **Enter**.

67
00:09:24 --> 00:09:30
Similarly to calculate **standard** **deviation** we will use the **function std**
68
00:09:31 --> 00:09:49
Standard deviation for English subject can be found by typing **np dot std **inside parentheses **L **inside square brackets **colon comma 0**. Press **Enter**.
69
00:09:50 --> 00:10:02
And for all **rows**, we do, **np dot std **inside parentheses **L comma 0 **and press **Enter.**
70
00:10:03 --> 00:10:08
Pause the video here, try out the following exercise and resume the video.
71
00:10:09 --> 00:10:17
Refer to the file** football.txt**, that is available in the **Code Files** link of this tutorial.
72
00:10:18 --> 00:10:22
Download and save the file in the **present working directory**.
73
00:10:23 --> 00:10:27
Currently the **present working directory** is the **Home directory.**
74
00:10:28 --> 00:10:33
In **football.txt**, the first column is **player name**,
75
00:10:34 --> 00:10:41
Second is **goals** **at home** and third is **goals away**.
76
00:10:42 --> 00:10:49
Find the total goals for each player
**Mean** of home and goals away

77
00:10:50 --> 00:10:54
**Standard deviation** of home and goals away
78
00:10:55 --> 00:10:57
Switch to the terminal.
79
00:10:58 --> 00:11:30
The solution is, first, type, **L** is equal to **np dot loadtxt** inside parentheses inside quotes **football.txt comma usecols** is equal to inside parentheses **1 comma 2 comma delimiter** is equal to inside quotes **comma**. Press **Enter**.
80
00:11:31 --> 00:11:38
**np dot sum **inside parentheses **L comma 1 **and press **Enter**.
81
00:11:39 --> 00:11:49
The answer for the second, **np dot mean **inside parentheses **L comma 0 **and press **Enter**.
82
00:11:50 --> 00:11:58
Third, **np dot std **inside parentheses **L comma 0 **and press **Enter**.
83
00:11:59 --> 00:12:17
This brings us to the end of the tutorial.
In this tutorial, we have learnt to do the standard **statistical operations** like: **sum**, **mean**, **median** and **standard deviation** in **Python**.

84
00:12:18 --> 00:12:22
Here are some self assessment questions for you to solve.
85
00:12:23 --> 00:12:31
Given a **two dimensional list **as shown, how do you calculate the **mean** of each row?
86
00:12:32 --> 00:12:36
Second. Calculate the **median** of the given **list**.
87
00:12:37 --> 00:12:50
Third. There is a **file** with 6 **columns**. But we want to load text only from **columns** 2,3,4,5.
How do we specify that?

88
00:12:51 --> 00:13:01
And the answers,
To get the **mean** of each **row**, we just pass 1 as the second **parameter** to the **function mean**

89
00:13:02 --> 00:13:10
**np.mean **inside parentheses** two_dimensional_list comma 1**
90
00:13:11 --> 00:13:23
We use the **function median** to calculate the **median** of the **list**
**np.median **inside parentheses **student_marks**

91
00:13:24 --> 00:13:38
Third, To specify the particular **columns** of a file, we use the parameter **usecols **is equal to inside parentheses **2, 3, 4, 5**
92
00:13:39 --> 00:13:42
Please post your timed queries in this forum.
93
00:13:43 --> 00:13:47
Please post your general queries on **Python** in this forum.
94
00:13:48 --> 00:13:52
FOSSEE team coordinates the TBC project.
95
00:13:53 --> 00:14:04
Spoken Tutorial Project is funded by NMEICT, MHRD, Govt. of India.
For more details, visit this website.

96
00:14:05 --> 00:14:10
Thats it for the tutorial.
This is Trupti Kini from IIT Bombay signing off. Thank you.