CSCI 333.01W Assignment 10

Intro to Data Science, Data Visualization

100 points

Deadline: 4/06/2020 Tuesday by 11:59pm

 

1. (20 points) True or False questions, and Multiple choice or Fill in blank questions:

 

1) (8 points, 1 point each sub question) (Select all that apply) which of the following characteristics are for NumPy array, which are for pandas:

a) fixed size multidimensional object at creation

b) can only be 1 or 2 dimensions

c) support missing data

d) does not support missing data

e) support custom indexing, like strings

f) uses only zero-based indices

g) elements can be of heterogenous data type

h) elements must be of homogenous data type

 

Answer: NumPy array:

pandas:

 

2) (2 points) Pandas Series is for___ collections, Pandas DataFrame is for ____ collections

a) one dimensional

b) two dimensional

c) multi-dimensional

Answer:

 

3) (2 Points) The Matplotlib function ______ visually prints/displays a figure on the screen, The Matplotlib function ______ is to draw an image on a figure using the array data but not for visually displaying the figure on the screen.

a) pyplot.imshow()

b) pyplot.show()

c) both a) and b)

Answer:

 

4) (2 points) (True/False) NumPy array is fixed size at creation. The length of a Series cannot be modified too after definition. In DataFrame, columns can be added or removed though.

Answer:

 

 

5) (2 points) (True/False) Matplotlib and Seaborn are two open-source visualization libraries. Need to use import statement to import their modules before using them in the code.

Answer:

 

 

 

6) (4 points total) from the following program,

import matplotlib.pyplot as plt

import matplotlib.image as mpimg

img = mpimg.imread(r’c:\bird.jpg’)

imgplot=plt.imshow(img)

plt.show()

print(img)

print(img.shape)

 

 

· (2 points) What is each element (e.g. 148) in the output of img?

a) A color

b) A pixel

c) Location of a pixel

Answer:

 

· (2 points) The output of img.shape is (1280, 1920, 3), it represents: a tuple of number of rows, columns, and channels of the image. (True or False)

Answer:

 

 

2. (25 points) Hand-trace the following code. What is the output, or what error/problem do you observe and why?

1) (5 points)

import pandas as pd
grades = pd.Series({'Tom': 80, 'John': 100, 'Kelly': 90})
print(grades)
print(grades[0])
print(grades['Kelly'])
print(grades.Kelly)
print(grades.values)

 

Output:

 

 

 

2) (5 points)

import pandas as pd
grades_dict = {'Tom': [87, 96, 70], 'John': [100, 87, 90],
               'Kelly': [94, 77, 90], 'Betty': [100, 81, 82]}
grades = pd.DataFrame(grades_dict, index=['Test1', 'Test2', 'Test3'])
pd.set_option('precision',2)
print(grades)
print(grades.iloc[1])
print(grades.iloc[[0, 2]])
print(grades.iloc[0:2])
print(grades.T)

 

Output:

 

 

 

 

 

 

 

 

 

 

 

 

3) (5 points)

import pandas as pd
profession = pd.Series(['student', 'teacher', 'worker'])
print(profession)
print(profession.str.contains('t'))
print(profession.str.upper())

 

Output:

 

 

 

 

 

 

 

 

 

4) (5 points)

The upper and left half corner of the bird image is our ROI. Modify the following code to show the original bird image in color, and our ROI image in R channel gray color (Hint: Slice half of the rows and columns to get the upper left half corner ROI)

 

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread(r'c:\bird.jpg')
print(img.shape) #(1280, 1920, 3)
imgplot = plt.imshow(img)
plt.show()

 

 

Output image from your code:

Answer:

 

 

 

 

 

 

 

5) (5 points)

There are some missing codes in the following, according to these task, fill out the missing code and make the program work:

 

a) Convert a dictionary into the DataFrame named temperatures with ‘Low’ and ‘High’ as the indices then display the DataFrame

b) Use the column names to select only the columns for ‘Mon’ Through ‘Wed’.

c) Use the row index ‘Low’ to select only the low temperatures for each day.

d) Set the floating-point precision to 2, then calculate the average temperature for each day

e) Calculate the average low and high temperatures.

 

import pandas as pd temps = {‘Mon’:[70, 80], ‘Tue’:[75, 85],‘Wed’:[65, 80], ‘Thu’:[62, 86], ‘Fri’:[67, 83]} temperatures = pd.DataFrame(temps, index=[]) print(temperatures) print(temperatures.loc[]) print(temperatures.loc[]) pd.set_option() print(temperatures.mean())

 

Answer:

 

 

 

 

 

 

 

 

3. (13 points) Perform the following tasks with pandas Series, and output the results:

a) (2 points) Create a Series from the list [7, 11, 13, 17].

b) (2 points) Create a Series with five elements that are all 100.0.

c) (3 points) Create a Series with 10 elements that are all random numbers in the range 0 to 100. Use method describe() to produce the Series’ basic descriptive statistics.

(hint: you may use the NumPy’s random-number generation to create an array of random integers, then create a Series from the array. https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randint.html )

d) (2 points) Create a Series called temperatures of the floating-point values 98.6, 98.9, 100.2 and 97.9. Using the index keyword argument, specify the custom indices ‘Julie’, ‘Charlie’, ‘Sam’ and ‘Andrea’.

e) (2 points) Form a dictionary from the names and values in (d), then use it to initialize a Series.

Answers:

(11 points) Write your program here, or copy/paste a screenshot of your Program:

 

(2 points) Save the program as “program1.py”. Upload the .py file as part of your submission.

 

4. (22 points, 2 points each sub-question) Perform the following tasks with pandas DataFrames, and output the results:

a) Create a DataFrame named temperatures from a dictionary of three temperature reading each for ‘Maxine’, ‘James’ and ‘Amanda’ (give any temperature you like).

b) Recreate the DataFrame temperatures in Part (a) with custom indices using the index keyword argument and a list containing ‘Morning’, ‘Afternoon and ‘Evening’.

c) Select from temperatures the column of temperature readings from ‘Maxine’.

d) Select from temperatures the row of ‘Morning’ temperature readings

e) Select from temperatures the rows for ‘Morning’ and ‘Evening’ temperature readings.

f) Select from temperatures the columns of temperature readings for ‘Amanda’ and ‘Maxine’.

g) Select from temperatures the elements for ‘Amanda’ and ‘Maxine’ in the ‘Morning’ and ‘Afternoon’.

h) Use the describe method to produce temperatures’ descriptive statistics.

i) Transpose temperatures.

j) Sort temperatures so that its column names are in alphabetical order.

Answers:

(20 points) Write your program here, or copy/paste a screenshot of your Program:

 

 

(2 points) Save the program as “program2.py”. Upload the .py file as part of your submission.

 

5. (20 points) Twenty students were asked to rate on a scale of 1 to 5 the quality of the food in the student cafeteria, with 1 being “awful” and 5 being “excellent”.

a) (2 points) Place the 20 responses in a list: 1, 2, 5, 4, 3, 5, 2, 1, 3, 3, 1, 4, 3, 3, 3, 2, 3, 3, 2, 5.

b) (2 points) Use this list to create a Series

c) (4 points) Best to use Series method to determine the frequency of each rating (refer to Series.value_counts() method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html)

d) (8 points) Display a bar chart showing the response frequencies like the following figure, with labels, colors, titles, bar_width = 0.5.

(5 points )display each bar’s percentage over the total responses

(hint: refer to the following links about Matplolib annotation (similar to: plt.annotate(annoStr, xyAnnoLoc(x,y), ha=’center’) )

· https://matplotlib.org/stable/gallery/lines_bars_and_markers/barchart.html

· https://matplotlib.org/stable/tutorials/text/annotations.html

 

Answers:

(16 points) Write your program here, or copy/paste a screenshot of your Program:

 

 

(2 points) Save the program as “program3.py”. Upload the .py file as part of your submission.

 

(2 points) Output your bar chart:

 

 

(5 points will be given if question is implemented in above program and shown in output)

 

2