Applied Data Science

Sharif University of Technology, Spring 2025

About This Course

In this course, our aim is to get STARTED on using data science in practice, either for academic projects or problems in the industry which we might face later on in our careers.

The course is more about breadth than depth, so we try to cover a lot of topics, but we won’t get too deep in them, as it would take “forever”!

We go over all steps of a data problem workflow, namely:

Gathering Data → Cleaning → Preprocessing → Analysis → Visualization

Throughout this process, we work on several different sample problems in the lab sessions as well as through homeworks and final projects, in order to be familiar with for various type of problems usually faced in data science.

Learning Objectives

By the end of the semester, the students will (hopefuly!):

  • Learn how to prepare the dataset they are going to use (cleaning, preprocessing and exploratory data analysis),
  • Become familiar with different types of problems (e.g. regression/classification), as well as main learning types (e.g. supervised/unsupervised),
  • Understnat different ways to measure the accuracy of the algorithms (e.g. MSE, MAE, precision, recall, etc.) and where to use each,
  • Practice using some of the most well-known and widely used algorithms for classification/regression problems (including Decision Trees, Neural Networks, etc.)
  • Strengthen their reporting/visualizing skills to communicate the results in the most effective way.
  • Practice the learned skills on different datasets to get more experience on different data problems.

Overall, our main target is that we can conduct the whole workflow (e.g. from gathering/finding the dataset to pareparing it and then analyzing and reporting the results) several times to get more confident and be more skilled.

Prerequisites

  • Familiarity with statistics and basics of data science
  • Knowledge of programming in Python, NumPy and Pandas (we will cover the basics of Pandas in the first session but it’d be better to be already familiar with it).
  • Having worked with Google Colab before would be a plus!

Reading Material

There is no required textbook for this course. But the students are expected to go over the material that is introduced along the way during the course.

Grading (TBD)

  • Homeworks: 40%
    • Homeworks usually come with bonuses. These bonuses only apply to homeworks!
  • Final Project: 40%
    • 1st progress reports during the semester: 25 Points
    • Final progress and notebook report: 45 Points
    • Final presentation on the last day of class: 30 Points
      • Of the 20 points of the presentation, 15 comes from other groups!
  • Final Exam: 20%

  • Bonus of upto 6% for participation in contests! More on that later on
    • Upto 3% points for participating in a Kaggle contest.
    • Upto 3% points for a classwide challenge on a common dataset

Logistics

  • The course is divided into a lecture and a lab session each week.
  • We are going to use Python for the hands on lab sessions. But homeworks/projects can be handed as a pre-computed Jupyter notebooks.
  • Communications are done via MS Teams chat rooms.

Teaching Staff

Lecturers

Hamed Shah Mansouri

Amir Hesam Salavati

Teaching Assistants

TBD

Contact

Amir Hesam Salavati