Introduction to Deep Learning

This is a collection of homework and course projects for CMU 11785: Introduction to Deep Learning (Term: 18 Fall).

  • Homework 1: Frame Level Classification of Speech (spec)
    • Build from scratch a MLP class supporting backprob, batchnorm, softmax and momentum, using only Numpy.
    • Identify the phoneme state label for WSJ utterance frames using MLP.
  • Homework 2: Speaker Veriļ¬cation via Convolutional Neural Networks (spec)
    • Determining whether two speech segments were uttered by the same speaker.
    • Extract speaker embeddings from utterances using CNN, followed by dense layers to train with N-way classification.
  • Homework 3: Seq2Seq Phonemes Prediction (spec)
    • Build a Seq2Seq model for phonemes prediction of unaligned utterance data.
    • Incorporate into the model the CTC loss and beam search decoder.
  • Homework 4: Attention-based End-to-End Speech-to-Text Deep Neural Network (spec)
    • Implement the character based Listen, Attend and Spell (LAS) model to translate utterances to corresponding text transcripts.
    • The listener consists of a pyramidal bi-LSTM network that produce attention keys and values. The decoder is an LSTM that yields sequential outputs.
  • Course Project: Dynamic fleet management with deep reinforcement learning (report)
    • Apply online deep reinforcement learning models for dynamic taxi dispatch in NYC.
    • Implement a CNN-based deep Q network and a refined diffusion-CNN model.

Technical tools: Python, PyTorch, TensorFlow, AWS