You were hired as a Big Data Analyst for a large 50 years old UMGC academic records system. Most of the academic records are stored in Ascii-based text files. The files are stored on high volume hard disks, optical discs, CDs, and DVDs. Three sample records in a file are shown below:
Record 1
Program: Information Technology, Specialization: Database Systems, Course: DBST651 Grade: A, Course: ITEC630 Grade: B, Course: DBST667 Grade: A, instructor 1: James Smith (DBST651), instructor 2: Jennifer Lopez (DBST651), instructor 3: Jennifer Lopez (ITEC630), instructor 4: Catharine Murphy (DBST667), student name: Yelena Bytenskaya, EmplD: 123456 , User Name : ybytensk, instructor id: 234567, graduated: Yes
Record 2
Program: Data Analytics, Course: DATA610 Grade: B, Course: DATA620 Grade: A, Course: DATA630 Grade: C, Course: DATA630 Grade: A, Course: DATA640 Grade: B, Course: DATA650 Grade: A, Course: DATA670 Grade: B, instructor 1: Steve Knode (DATA610), instructor2: Caroline Beam (DATA620), instructor 3: Bati Firdu (DATA630), instructor 4: Elena Gortcheva (DATA650) , instructor 5: Ozan Ozcan (DATA650) instructor 7: Jon McKeeby (DATA670), instructor 8: Steve Knode (DATA670), instructor 9: Steve Knode (DATA640), instructor 10: TA Yelena Bytenskaya (DATA650), student name: Linesh Dave, EmlID:567890, user name: ldave, instructor id: 567907, graduated: Yes
Record 3
Program: Information Technology, Specialization: Database Systems, Specialization: Project Management, Specialization: Software Engineering, course: DBST651 grade: F, course: DBST651 grade: B, course: ITEC610 grade: B, course: ITEC620 grade: A , course PMAN634 grade: C, instructor 1: Brandon Morris (ITEC 610), instructor 2: Elena Gortcheva (ITEC620), Instructor 3: James Green (DBST651), Instructor4: TA Yelena Bytenskaya (DBST651), student name: Jeff Martin, emplID: 987654, user name: jmartin, graduated: No
You are given the following information about the data (metadata):
· A student is enrolled in a program.
· Some programs may offer specializations. A student enrolled in a program that offers specializations may choose one or more specialization.
· New specializations could be added to a program. If new specialization is added to a program that the student is enrolled in, the student may choose that specialization.
· A student takes multiple classes and receives the final letter grade in each class.
· A class session may have multiple instructors. A student may take multiple classes with the same instructor.
· A student who graduated could be hired as an instructor.
· A student could have multiple IDs (6-digit emplid, user name for accessing online classroom and academic records, faculty ID if a student is hired as an instructor.)
· If a student repeats the class, the grade received on the last attempt overwrites the grade received on prior attempts for GPA calculation and on a transcript. However, the system should track all attempts for academic advising purposes.
· The courses can be taken in any order
Your task: Theoretically set up a searchable database that can flexibly accommodate all the above requirements, can contain records of several hundred million students.
Your paper must have Introduction, Problem Statement, Design, Implementation Methods, Conclusion with a discussion of the pros and cons of your design. The following are required:
1.Design showing the different Big Data Systems that you will use to solve this problem.
2.Pseudocode of a function that will read in each record, parse it, and transform it into HBase table.
3.HBase data model showing column families and columns.
4. The student in record 3 above is enrolled in Information Technology program. How would you handle adding a new specialization to Information Technology program and letting the student choose it as additional specialization?
5.Discussion of the pros and cons of choosing ACID vs CAP systems for this problem.
6.Discussion of queries that database users would run.
7.Ideas for improving the speed of the query tool.