Description: This course is an introduction to modern data science. Data science is the study of how to extract actionable, non-trivial knowledge from data. The course will focus on the software tools used by practitioners of modern data science, the mathematical and statistical models that are employed in conjunction with such software tools and the applications of these tools and systems to different problems and domains. On the tools side, we will cover the basics of relational database systems, as well as modern systems for manipulating large data sets such as Hadoop MapReduce, Apache Spark, and Google’s TensorFlow. On the model side, the course will cover standard supervised and unsupervised models for data analysis and pattern discovery. Mathematical sophistication (calculus, statistics) and programming skills that would be acquired in an undergraduate computer science program are expected. Most programming will be in Python and SQL. (SQL is covered in the course) with some Java. Cross-list: COMP 330. Mutually Exclusive: Cannot register for COMP 543 if student has credit for COMP 330.