Introduction
There is awesome books, lecture and surveys, Deep Learning Bible, you can read this book while reading following papers.
You can read or take these courses while reading following papers.
Books
Deep Learning
[1]Ā Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. āDeep learning.ā An MIT Press book. (2015) (http://www.deeplearningbook.org/)Ā (Deep Learning Bible, you can read this book while reading following papers.) It covers the concepts and the math behind DL algorithms perfectly. I donāt recommend starting with this book since itās really hard. But if you want to strengthen your knowledge after completing the courses above, this book is prefect.
[2] Hands-on machine learning with Scikit-learn Keras and TensorFlow by Aurelion Geron published by OāReilley
This book is an awesome resource for learning ML and DL and also learning to code and implement the algorithms. Iād recommend starting with this if youāre more comfortable with books than courses.
[3] Dive into Deep Learning
This is an awesome reference for both getting into the math and the code for Deep Learning. It contains code examples and implementations in all popular DL frameworks (PyTorch, Tensorflow, and MXNET)
Itās available online for free and constantly updated and involves all the newest material on Deep Learning.
If youāve got the time, I definitely suggest reading this. Iām actually starting to read it for upgrading my coding knowledge.
[4] Neural Networks and Deep Learning(Michael Nielsen)
- Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
- Deep learning, a powerful set of techniques for learning in neural networks
Mathematics
[1] [Mathematics for Machine Learning] (https://mml-book.github.io/) Fast and efficient way
[2] MIT OCW Linear Algebra 18.06Ā YouTube Playlist Thereās also this legendary course on Linear Algebra, taught by Prof. Gilbert Strang at MIT, and itās publicly accessible. Well, Iād really recommend watching this course if youāre really into math and want to learn a whole lot more about linear algebra, and youāve got the time too. Itās definitely more than enough for starting ML, but if you feel like learning more, go for it:Ā MIT OCW Linear Algebra 18.06Ā YouTube Playlist
[3] Probability and Statistics for Engineers and Scientistsā by Walpole, Mayers, Ye.
More deep and academic way:
If you would like to dive deeper into the world of probability and statistics, Iād suggest the book
Surveys
[1]Ā LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. āDeep learning.ā Nature 521.7553 (2015): 436-444.Ā [pdf]Ā (Three Giantsā Survey)
Lecture
[1] Machine Learning Specialization by Andrew Ng
This is probably the most popular ML course on the internet and A LOT of people have started their path into ML using it. Itās also the most popular and I guess the highest ranked course on Coursera (4.9/5).
This Specialization is made of 3 courses covering the main parts of Machine Learning and by the end of it, youāll have a good understanding of ML Algorithms and how to implement and use them in python.
[2] More deep, academic course:
Stanford CS229 Machine Learning
This is the Machine Learning course taught at Stanford University, recorded in the class and uploaded on YouTube.
Well, as I said before, Iām a fan of more deep academic courses and this isĀ THE COURSEĀ to go with if youāre like me. It involves a lot more math and details on ML concepts and algorithms and of course is more difficult to follow, but if you think youāll be ok with the huge math and stuff and you wonāt run away halfway through the course, donāt even hesitate to start with this one.
The videos are uploaded online on YouTube and the course material is accessible from the course website. Two versions are available online, one from theĀ Autumn 2018 (Andrew Ng)Ā semester and one fromĀ Summer 2019 (Anand Avati).
The first one is taught by Andrew Ng, the same instructor as the Coursera ML course introduced above, and the latter one is taught by Anand Avati, Andrewās Ph.D. student.
Choosing between the two is more a personal preference, I myself love Andrewās way of teaching and Iām more comfortable with it.
Although, Anand Avatiās course is newer and covers more subjects. It even involves the math required for the course in the first three lectures.
[3] Easier to follow (Probably more popular):
Deep Learning Specialization offered by DeepLearning.AI taught by Andrew Ng
This is a 5-course specialization, covering almost everything you need to understand Deep Learning and its ways.
[4] More deep, academic course:
Stanford CS231n: Deep Learning for Computer Vision
I actually started Deep Learning with this course, and Iāve got to say, itās THE BEST COURSE to start with if youāre ok to get a little deeper into the field like me.
It is more focused on Deep Learning applications in Computer Vision, but it also covers ALL the basic and necessary aspects of Deep Learning too. So you should not worry about it being for Computer Vision at all.
As a matter of fact, I watched the whole DL Specialization mentioned above too, after finishing this course, and I already knew all the stuff taught in the Specialization (and more) from this course. It even involves some Neural Network architectures mostly used in NLP.
The only part of the Coursera Specialization that teaches more than this is the 5th course (Sequence Models) which is more focused on NLP.
Its only drawback is that the available lecture videos are from the 2017 class, and it doesnāt cover some new topics like transformers. But if youāre interested enough, youāll learn that new stuff on your own. (Thereās also CS231nās new semesterās course notes available which you can keep reading from those to learn the new methods too)
After CS231n, Iād recommendĀ CS224nĀ if youāre interested in Natural Language Processing and want to get deep in that field.

Papers
So now is the paper.
This roadmap is constructed in accordance with the following four guidelines:
- From outline to detail
- From old to state-of-the-art
- from generic to specific areas
- focus on state-of-the-art
Milestone for 100 days
Basic DL Architecture
- Convolutional Neural Networks (CNNs)
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. āGradient-Based Learning Applied to Document Recognitionā (1998)
- Key Concepts: Convolutional layers, pooling layers, fully connected layers, and early applications to digit recognition (MNIST dataset).
- Recurrent Neural Networks (RNNs)
- Hochreiter, S., & Schmidhuber, J. āLong Short-Term Memoryā (1997)
- AlexNet
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. āImageNet Classification with Deep Convolutional Neural Networksā (2012)
- Key Concepts: Deeper networks, ReLU activation, dropout regularization, and large-scale image classification (ImageNet dataset).
- GoogLeNet(Inception)
- Szegedy, C., et al. āGoing Deeper with Convolutionsā (2014)
- Key Concepts: Inception modules, dimensionality reduction, and efficiency in computation.
- VGGNet
- Simonyan, K., & Zisserman, A. āVery Deep Convolutional Networks for Large-Scale Image Recognitionā (2014)
- Key Concepts: Simplicity in architecture with deeper layers, use of smaller 3x3 convolution filters.
- BN-Inception
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)
- Inception-v2~ v3
- Rethinking the Inception Architecture for Computer Vision (2016)
- ResNet
- He, K., Zhang, X., Ren, S., & Sun, J. āDeep Residual Learning for Image Recognitionā (2015)
- Key Concepts: Residual blocks, solving vanishing gradient problem, and very deep networks (e.g., ResNet-50, ResNet-101).
- DenseNet
- Densely Connected Convolutional Networks(2017)
- Key Concepts: Dense connections, feature reuse, and efficient gradient flow.
- Inception-v4
- Inception-ResNet and the Impact of Residual Connections on Learning(2016)
- GANs
- Goodfellow, I., et al. āGenerative Adversarial Netsā (2014)
- Word2Vec
- Mikolov, T., et al. āEfficient Estimation of Word Representations in Vector Spaceā (2013)
- Seq2Seq
- Sutskever, I., Vinyals, O., & Le, Q. V. āSequence to Sequence Learning with Neural Networksā (2014)
- Attention Mechanism
- Bahdanau, D., Cho, K., & Bengio, Y. āNeural Machine Translation by Jointly Learning to Align and Translateā (2014)
- Transformers
- Vaswani, A., et al. āAttention is All You Needā (2017)
- BERT
- Devlin, J., et al. āBERT: Pre-training of Deep Bidirectional Transformers for Language Understandingā (2018)
- GPT
- Radford, A., et al. āImproving Language Understanding by Generative Pre-Trainingā (2018)
- EfficientNet
- Tan, M., & Le, Q. V. āEfficientNet: Rethinking Model Scaling for Convolutional Neural Networksā (2019)
- MobileNet
- Howard, A. G., et al. āMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applicationsā (2017)
- Key Concepts: Depthwise separable convolutions, lightweight models for mobile and embedded vision applications.
- DALL-E - Ramesh, A., et al. āZero-Shot Text-to-Image Generationā (2021)
Advanced DL Architecture
Foundation for Generative Models
Transformers
Language Models
Segmentation
Image Generative Models
Diffusion Models
- DDPM [CVPR 2018]
- LatentDiffusion [CVPR 2022]
- Distillation [ICLR 2022]
- SDE Explanation [ICLR 2021]
- Classfier-Free Guidance [NeurIPS 2021 Workshop]
Diffusion Models Manipulation
- DreamBooth [CVPR 2023]
- NullText-Inversion [CVPR 2023]
- Instructpix2pix [CVPR 2023]
- TextDeformer [SIGGRAPH 2023]
Neural Radiance Fields
- NeRF [CVPR 2021]
- TensoRF [ECCV 2022]
- Instant-NGP [SIGGRAPH 2022]
- 3D Gaussian [ACM Transactions On Graphics 2023]
3D Reconstruction
- ORB-SLAM2 [2017]
- Colmap [2016]
- Phototourism [ACM Transactions On Graphics 2006]Ā
- Shape And Spatially-Varying Brdfs From Photometric Stereo
Implicit Representations
3D Generative Models
3D Scene Generation
SLAM
Dynamic Reconstruction
- HyperNeRF [SIGGRAPH Asia 2021]
- Dynamic 3D Gaussians
- Monocular Dynamic View Synthesis: A Reality Check [NeurIPS 2022]
- Neural Jacobian Fields [ACM Transactions On Graphics 2022]
Motion Generation
- Motion Diffusion Models [ICLR 2023 ]
- Synthesizing Physical Character-Scene Interactions
- Video Diffusion Models [2022]
Correspondences
- RAFT [ECCV 2020]
- PIPs [ECCV 2022]
- [SIFT [IJCV 2004]](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
- LoFTR [ CVPR 2021]
- DKM [CVPR 2023]
Learning from Videos
- SlowFast Networks [CVPR 2019]
- Object Landmarks [NeurIPS 2018]
- Watching Frozen People [CVPR 2019]
- AMD [NeurIPS 2021]
Internal Learning
- SinGAN [ICCV 2019]
- Drop The GAN [CVPR 2022]
- Zero Shot Super-Resolution [CVPR 2018]
- Learn From A Single Image [ICLR 2020]
- SinGRAF [CVPR 2023]
- SinFusion [ICML 2023])
Category and Pose
- Congealing
- Category-Viewpoint Combinations [ICLR 2021]
- Neural Congealing
- Correspondence From Image Diffusion [2023]
Parts and Wholes
- Detect What You Can [CVPR 2014]
- Attentional Constellation Nets [ICLR 2021]
- Semantic Understanding Of Scenes [IJCV 2018]
- Hedging Your Bets [CVPR 2012]
Fine-grained Recognition
- Between-Class Attribute Transfer [CVPR 2009]
- Attributes As Operators [ECCV 2018]
- Semantic Output Codes [NeurIPS 2009]
- INaturalist [CVPR 2018]
Few-shot Learning
- Prototypical Networks [NeurIPS 2017]
- Flamingo [NeurIPS 2022]
- Matching Net [NeurIPS 2016]
- Conditional Prompt Learning For Vision-Language Models [CVPR 2022]
Continual Learning
- Overcoming Catastrophic Forgetting [PNAS 2017]
- Robust Fine-Tuning Of Zero-Shot Models [CVPR 2022]
- Variational Continual Learning [ICLR 2018]
- Gradient Projection Memory [ICLR 2021]