CS 6890: Deep Learning

Spring 2017

This course will introduce a number of approaches for

Logistic and Softmax Regression, Feed-Forward Neural Networks, Backpropagation, Sparse Autoencoders, Denoising Autoencoders, Linear Decoders, Vectorization, PCA and Whitening, Self-Taught Learning, Deep Networks, Convolution and Pooling, Recurrent Neural Networks, Long Short-Term Memory, Gated Recurrent Units, Neural Attention Models, Sequence-to-Sequence Models, Memory Networks.

Previous exposure to basic concepts and models in machine learning, such as: supervised vs. unsupervised learning, classification vs. regression, linear regression, logistic and softmax regression, cost functions, overfitting and regularization, gradient-based optimization. Experience with programming and familiarity with basic concepts in linear algebra and statistics.

- Syllabus & Introduction
- Andrew Ng's introductory presentation at UCLA Graduate Summer School: Deep Learning, Feature Learning.

- Linear Regression, Perceptrons, Logistic and Softmax Regression
- Linear algebra and optimization in Python
- Machine Learning with Computation Graphs in TensorFlow
- Feed-Forward Neural Networks and Backpropagation
- Unsupervised Feature Learning
- Autoencoders
- Sparse Feature Learning for Deep Belief Networks, Ranzato, Boureau and LeCun. NIPS 2007.
- Extracting and Composing Robust Features with Denoising Autoencoders, Vincent, Larochelle, Bengio and Manzagol. ICML, 2008.
- Contractive Auto-Encoders: Explicit Invariance During Feature Extraction, Rifai, Vincent, Muller, Glorot, and Bengio. ICML, 2011.

- PCA, PCA whitening, and ZCA whitening
- Auto-Association by Multilayer Perceptrons and Singular Value Decomposition, H. Bourlard and Y. Kamp, Biological Cybernetics, 1988

- Sparse Coding
- Independent Component Analysis
- Unsupervised Learning of Word Representations
- Slides on Language Representation and Modeling, Kapil Thadani, 2017.
- A Neural Probabilistic Language Model, Bengio, Ducharme, Vincent, and Jauvin, JMLR 2003.
- Natural Language Processing (Almost) from Scratch, Collobert, Weston, Bottou, Karlen, Kavukcuoglu, and Kuksa, JMLR 2011.
- Distributed Representations of Words and Phrases and their Compositionality, Mikolov, Sutskever, Chen, Corrado, and Dean, NIPS 2013.

- Canonical Correlation Analysis
- CCA hand notes
- Deep CCA slides
- Deep Canonical Correlation Analysis, Andrew, Arora, Bilmes, and Livescu, ICML 2013.
- Canonical Correlation Analysis: An Overview with Application to Learning Methods, Hardoon, Szedmak, and Shawe-Taylor, Neural Computation, 2004.

- Autoencoders
- Self-Taught Learning and Deep Learning
- On the Number of Linear Regions of Deep Neural Networks, Montufar, Pascanu, Cho, and Bengio. NIPS 2014.
- Why does deep and cheap learning work so well?, Lin and Tegmark. CoRR 2016.
- Sum-Product Networks
- Sum-Product Networks: A New Deep Architecture, Poon and Domingos, UAI 2011.
- Shallow vs. Deep Sum-Product Networks, Delalleau and Bengio, NIPS 2011.

- Gradient-based learning
- An overview of gradient descent optimization algorithms, Sebastian Ruder, CoRR 2016
- Animations of Gradient Descent Algorithms, Alec Radford, 2014

- Convolutional Neural Networks
- Andrej Karpathy's notes on CS231n: Convolutional Neural Networks for Visual Recognition.
- Christopher Olah's blog on Conv Nets: A Modular Perspective.
- UFLDL Tutorial at Stanford.

- Recurrent Neural Networks, LSTMS, GRUs
- Slides on RNNs and LSTMs, Arun Mallya, 2017.
- Slides on Sequence-to-Sequence Architectures, Kapil Thadani, 2017.
- Understanding LSTM Networks, Christopher Olah's Blog, 2015.
- Recurrent neural network based language model, Mikolov et al., 2010.
- Recurrent Neural Networks for large scale Language Modeling, Jozefowicz et al., Google Brain 2016.
- Character-Aware Neural Language Models, Kim et al, AAAI 2015.

- Assignment, code and data.
- Assignment, code and data.
- Assignment and code.
- Assignment, code and data.
- Assignment, code and data.
- Assignment, code, word2vec Google News embeddings, and the Stanford Natural Language Inference (SNLI) dataset.
- Reasoning about entailment with neural attention, Rocktaschel et al., ICLR 2016.

- Deep Dreams:
- presented by Aaron Robeson, Mar 30.
- A Neural Algorithm of Artistic Style, Gtys et al., CVPR 2016.
- Inceptionism: Going Deeper into Neural Networks, Mordvintsev et al., Google Research Blog, 2015.

- DeepTox:
- presented by Yunyi Feng, Mar 30.
- DeepTox: Toxicity Prediction using Deep Learning, Mayr et al., Frontiers in Environmental Science, 2016.
- DeepTox website

- Restricted Boltzman Machines (RBMs):
- presented by Xianlong Zeng and Mehdi Rezaie, Apr 4.
- Energy-Based Models, RBMs, and Contrastive Divergence [Chapter 5], Yoshua Bengio, Foundations and Trends in Machine Learning, 2009.
- Greedy Layer-Wise Training of Deep Architectures [Chapter 6], Yoshua Bengio, Foundations and Trends in Machine Learning, 2009.
- Learning Thermodynamics with Boltzmann Machines, Torlai and Melko, CoRR 2016.
- Horses, Adversarial Examples and Adversarial Training:
- presented by Alex Bagnall, Apr 6.
- Attacking machine learning with adversarial examples, OpenAI blog, 2017.
- On "Horses" and "Potemkin Villages" in Applied Machine Learning, Research workshop, QMUL, London, 2016.
- Intriguing properties of neural networks, Szegedy et al., ICLR 2014.
- Explaining and Harnessing Adversarial Examples, Goodfellow et al., ICLR 2015.
- Adversarial Examples in the Real World, Kurakin et al., ICLR 2017.
- Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, Nguyen et al., CVPR 2015.

- Neural Machine Translation:
- presented by Yi Yu, Apr 6.
- Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Wu et al., CoRR 2016.
- A Neural Network for Machine Translation, at Production Scale, Google Research Blog, 2016.
- Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., ICLR 2015, [slides]
- Sequence to Sequence Learning with Neural Networks, Sutskever, Vinyals, and Le, NIPS 2014.

- Memory Augmented Networks:
- presented by Alex Mayle and Yuanhang Zhang, Apr 11.
- Neural Turing Machines, Graves et al., CoRR 2014.
- End-To-End Memory Networks, Sukhbaatar et al., NIPS 2015.
- Memory Networks for Language Understanding, ICML Tutorial 2016.

- Meta-learning and One-shot learning:
- presented by Sam Merten, Apr 13.
- Matching Networks for One Shot Learning, Vinyals et al., NIPS 2016.
- Meta-Learning with Memory-Augmented Neural Networks, Santoro et al., 2016.

- Dense Associative Memory:
- presented by Kiran Prasai, Apr 13.
- Dense Associative Memory for Pattern Recognition, Krotov and Hopfield, NIPS 2016.
- Dense Associative Memory is Robust to Adversarial Inputs, Krotov and Hopfield, CoRR 2017.
- Hopfield Networks

- Deep Reinforcement Learning:
- presented by Yang Liu and Zhengchao Tian, Apr 18.
- Mastering the game of Go with deep neural networks and tree search, Silver et al., Nature 2016.
- Human-level control through deep reinforcement learning Mnih et al., Nature, 2015.
- Playing Atari with Deep Reinforcement Learning Mnih et al., 2013.
- Silver's Tutorial on Deep Reinforcement Learning

- Batch Normalization:
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe and Szegedy, ICML 2015.
- Recurrent Batch Normalization, Cooijmans et al., ICLR 2017.

- Extremely Deep Learning:
- Deep Residual Learning for Image Recognition, He et al., CVPR 2016.
- Going Deeper with Convolutions, Szegedy et al., 2015.
- FractalNet: Ultra-Deep Neural Networks without Residuals, Larsson et al., ICLR 2017.
- Densely Connected Convolutional Networks, Huang et al., EoRR 2016.

- Visualization of ConvNets:
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., ICLR 2014.
- Visualizing and Understanding Convolutional Networks, Zeiler and Fergus, ECCV 2014.

- Deep learning, LeCun, Bengio, and Hinton, Nature 2015
- Show and Tell: A Neural Image Caption Generator, Vinyals et al., CVPR 2015
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Xu et al., ICML 2015
- Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei, CVPR 2015
- Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, Kiros et al., TACL 2015
- Skip-Thought Vectors, Kiros et al., NIPS 2015
- Character-level Convolutional Networks for Text Classification, Zhang et al., NIPS 2015
- Translating Videos to Natural Language Using Deep Recurrent Neural Networks, Venugopalan et al., HLT-NAACL 2015
- End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks, Zhou and Xu, ACL 2015

- James H. Martin's Introduction to probabilities
- Jason Eisner's equestrian Introduction to probabilities
- Inderjit Dhillon's Linear Algebra Background
- Strang's Video Lectures on Linear Algebra
- Convex Optimization, Stephen Boyd and Lieven Vandenberghe, Cambridge University Press 2004
- Mike Brookes' Matrix Reference Manual
- Petersen et al.'s The Matrix Cookbook