Hello Friends. Welcome to the tutorial on "**Statistics**” using **Python**
At the end of this tutorial, you will be able to - Do **statistical** operations in **Python**
**Sum** a set of numbers and Find their **mean, median** and **standard deviation**
To record this tutorial, I am using **Ubuntu Linux 16.04** operating system
**Python 3.4.3 **and **IPython 5.1.0**
To practise this tutorial, you should know how to load data from files
use **Lists** and access parts of **Arrays**
If not, see the pre-requisite **Python** tutorials on this website.
For this tutorial, we will use the data file **student_record.txt **which we used in the earlier tutorial.
You can also find this file in the **Code Files** link of this tutorial.
Please download it in **Home directory** and use it.
We will use mathematical and logical operations on this **array structured file**.
For this, we need to install **Numpy**.

**NumPy**, stands for **Numerical Python.**
It is a library consisting of **pre-compiled functions** for mathematical and numerical routines.
**NumPy** has to be installed separately.
Let us first open the **Terminal **by pressing **Ctrl+Alt+T **keys simultaneously.
Let us install latest **pip**.
**pip** command is used to install** python libraries.**

Type, **sudo apt-get install python3 hyphen pip** and press **Enter**.
You need to have **root** access for installation as it asks for **admin password**.
Next, we need to install **numpy library** as we will be using **numpy library** throughout the tutorial.
Type, **sudo pip3 install numpy **is equal to is equal to** 1.13.3 **and press** Enter.**
The installation is completed successfully. We can see the **terminal prompt** without any error.
Next we will learn about **loadtxt() function.**
To get the data as an **array**, we use the **loadtxt() function.**
For **loadtxt() function**, we need to **import numpy library** first.
Switch back to the **terminal**.Now, type **ipython3** and press **Enter**.
Type **import numpy as np** and press **Enter**.
Where **np** is alias to **numpy** and it can be any name.

Let us load the data from the file **student_record.txt **as an **array**.
Type, **L** is equal to **np dot loadtxt** inside **parentheses** inside quotes **student_record.txt** comma **usecols** is equal to inside **parentheses** 3 comma 4 comma 5 comma 6 comma 7 comma **delimiter** is equal to inside quotes **semicolon** and Press **Enter**.
Type** L **and press** Enter**.
We get the **output** in the form of an **array**.
**loadtxt** loads data from an external file.
**Delimiter** specifies the kind of character that the **fields** of data is separated by.

**usecols** specifies the **columns** to be used.

**loadtxt, delimiter** and **usecols** are **keywords**.
So **columns** 3,4,5,6,7 from **student_record.txt **are loaded here.
The 'comma' between **column numbers** is added because **usecols** is a **sequence**.
As we can see **L** is an **array**. We can get the shape of this **array** using **shape.**
Type, **L dot shape **and press **Enter**.
We get a **tuple** giving the numbers of **rows** and **columns** respectively.
In this example, the array **L **has one lakh eighty five thousand six hundred and sixty seven rows and 5 columns.
Let us switch back to the **student_record.txt** file.
Let us start applying statistical operations on these.
How do you find the sum of marks of all subjects for the first student?

Switch back to the **terminal**.
To access the first row in an **array**, we will type **L **inside square brackets **0 **and press **Enter**.

Now to sum this, type, **totalmarks **is equal to **sum **inside parentheses **L **inside square brackets **0 ** and Press **Enter.**
45
Type **totalmarks **and press **Enter.**
We got sum of marks of all subjects of the first student.

Now to get the **mean** we can divide the **totalmarks** by the length of the **array.**
Type, **totalmarks **divided by **len** inside parentheses **L** inside square brackets **0 **and press **Enter.**
Or simply use the **function mean**. Type **np dot mean** inside parentheses **L **inside square brackets **0 **and press** Enter.**
But we have such a large **data** **set**.
And calculating the **mean** for each student one by one is time consuming.

Is there a way to reduce the work?
For this, we will look into the **documentation** of **mean.**

Type, **np dot mean questionmark **and press Enter*.* Read the text for more information.
Type **q **to exit the documentation.
In the above example, **L** is a **two dimensional array **like **matrix**.
We can calculate the **mean** across each of the **axis** of the **array**.
The **axis** of **rows** is referred by 0 and **columns** by 1.
To calculate **mean** across all **columns**, we have to pass extra parameter 1 for the **axis**.
Switch back to the **terminal**.
Let us calculate, **mean** of the marks scored by all the students for each subject.
Type **np dot mean **inside parentheses **L comma 0** *and press ***Enter**.
Next, we will calculate the **median** of English marks for all the students.
Type **L **inside square brackets **colon comma 0 **and press **Enter**.
Note **colon comma zero** displays first **column** in the **array** that is, English Mark.
To get the **median** we will simply use the **function median**.
Type **np dot median **inside parentheses **L **inside square brackets **colon** comma **0 **
Press **Enter**.

For all the subjects, we can calculate **median** across all **rows** using **median function** as shown here.
Type **np dot median **inside parentheses **L comma 0**
Press **Enter**.

Similarly to calculate **standard** **deviation** we will use the **function std**
Standard deviation for English subject can be found by typing **np dot std **inside parentheses **L **inside square brackets **colon comma 0**. Press **Enter**.
And for all **rows**, we do, **np dot std **inside parentheses **L comma 0 **and press **Enter.**
Pause the video here, try out the following exercise and resume the video.
Refer to the file** football.txt**, that is available in the **Code Files** link of this tutorial.
Download and save the file in the **present working directory**.
Currently the **present working directory** is the **Home directory.**
In **football.txt**, the first column is **player name**,
Second is **goals** **at home** and third is **goals away**.
Find the total goals for each player
**Mean** of home and goals away

**Standard deviation** of home and goals away
Switch to the terminal.
The solution is, first, type, **L** is equal to **np dot loadtxt** inside parentheses inside quotes **football.txt comma usecols** is equal to inside parentheses **1 comma 2 comma delimiter** is equal to inside quotes **comma**. Press **Enter**.
**np dot sum **inside parentheses **L comma 1 **and press **Enter**.
The answer for the second, **np dot mean **inside parentheses **L comma 0 **and press **Enter**.
Third, **np dot std **inside parentheses **L comma 0 **and press **Enter**.
This brings us to the end of the tutorial.
In this tutorial, we have learnt to do the standard **statistical operations** like: **sum**, **mean**, **median** and **standard deviation** in **Python**.

Here are some self assessment questions for you to solve.
Given a **two dimensional list **as shown, how do you calculate the **mean** of each row?
Second. Calculate the **median** of the given **list**.
Third. There is a **file** with 6 **columns**. But we want to load text only from **columns** 2,3,4,5.
How do we specify that?

And the answers,
To get the **mean** of each **row**, we just pass 1 as the second **parameter** to the **function mean**

**np.mean **inside parentheses** two_dimensional_list comma 1**
We use the **function median** to calculate the **median** of the **list**
**np.median **inside parentheses **student_marks**

Third, To specify the particular **columns** of a file, we use the parameter **usecols **is equal to inside parentheses **2, 3, 4, 5**
