Parallel and Distributed Computing

This is a collection of parallel and distributed computing projects I did. Levels of parallelism vary from data level SIMD to thread level OpenMP to Spark based map-reduce.

  • Project 1: Homemade Numpy (spec)
    • Design and implement a slower version of numpy that supports cache-optimized parallel matrix computations.
    • Highlights: C, SIMD, OpenMP
  • Project 2: Yelp Ratings Prediction (spec)
    • Use the MapReduce programming paradigm to parallelize a Naive Bayes classifier with a Bag of Words model in Spark to predict Yelp review ratings.
    • Highlights: Python, Spark, Map Reduce
  • Project 3: Parallel Huffman Coding (report)
    • Implement a parallel algorithm to generate Huffman codes.
    • Highlights: Java multithreading