Discuss the meaning of the results based on the data set.

College of Computing and Informatics



Project Dataset:


1- https://www.kaggle.com/austinreese/craigslist-carstrucks-data

2- https://www.kaggle.com/currie32/crimes-in-chicago

3- https://www.kaggle.com/hm-land-registry/uk-housing-prices-paid


You can choose any one of the previous datasets. And apply all the following tasks on the dataset you choose.


Project Required Steps:

Task 1: (2 Marks)

Topic 1: Sentiment analysis is used in identifying the public opinion through text analytics. Big data tools can aid in the storage and processing of data for sentiment analysis. Through such analysis, companies can better plan their processes and sales accordingly.


Topic 2: Machine Learning algorithms are very important in the field of data science. With the increasing number of data, it is very important and advantageous to apply those algorithms on Big Data.


Write a small Literature Review and discussion about topic 1 or topic 2 discussing how this topic can be implemented and used in Big Data applications, in no more than one paper. You must use at least six references and cite them in the Literature Review. The reference must be added to the template (Try using any referencing software).


Task 2: (1 Marks)

Load the data set into Hadoop File System. Discuss and explain the type and structure of the data. Show the steps that you followed during the importing process.




Task 3: (2 Marks)

Apply Map Reduce algorithm to produce useful statistical results. Discuss in detail the statistical results, and its meaning based on the dataset you have chosen.


Task 4: (1 Marks)

Import the data in MongoDB. Show the steps you followed to import the dataset to any of these NoSQL systems.


Task 5: (2 Marks)

Execute at least three queries on the data MongoDB. Describe your queries and the results. Discuss the meaning of the results based on the data set.


Task 6: (1 Marks)

Using Hive or Pig, execute at least three queries on the data set. Describe your queries and the results. Discuss the meaning of the results based on the data.


Task 7: (1 Marks)

Using Spark, run two SparkSQL statements on the dataset, and visualize the results in any of the charts (Hints: you can use Zeppelin directly).


Task 8 (Optional): (1 Marks as Bonus)

Using Mlib in Spark, build a suitable machine learning model and execute it on the data. Discuss your results.


· You can use Horton HDP sandbox with only one node. For the part on Spark you can use the same sandbox, or you can use Databricks cluster.

· All the tasks must be described in detail with the code written for each part.

· You can add screenshots of your steps to the project template.

determine the context, order, and any linkages between the required elements listed below

 Demonstrate your understanding of Assembly in relation to other languages, your ability to apply existing ARM64 assembly mnemonics and techniques to a specific purpose, and to demonstrate the ability to….

Identify and write the main issues found discussed in the case (who, what, how, where and when (the critical facts in a case).

Case Study: You have just been hired to perform digital investigations and forensics analysis for a company. You find that no policies, processes, or procedures are currently in place. Conduct….

Identify dependencies between various business areas and functions.

PROJECT TITLE Business Continuity Plan for Financial Institutions ABSTRACT Due to increase in customers’ demand, competition, 24hrs continuous service, frequent changes in regulatory policy requirements and changes in various threats….