Introduction

There is awesome books, lecture and surveys, Deep Learning Bible, you can read this book while reading following papers.

You can read or take these courses while reading following papers.

Books

Deep Learning

[1]Ā Bengio, Yoshua, Ian J. Goodfellow, and Aaron Courville. ā€œDeep learning.ā€ An MIT Press book. (2015) (http://www.deeplearningbook.org/)Ā (Deep Learning Bible, you can read this book while reading following papers.) It covers the concepts and the math behind DL algorithms perfectly. I don’t recommend starting with this book since it’s really hard. But if you want to strengthen your knowledge after completing the courses above, this book is prefect.

[2] Hands-on machine learning with Scikit-learn Keras and TensorFlow by Aurelion Geron published by O’Reilley
This book is an awesome resource for learning ML and DL and also learning to code and implement the algorithms. I’d recommend starting with this if you’re more comfortable with books than courses.

[3] Dive into Deep Learning
This is an awesome reference for both getting into the math and the code for Deep Learning. It contains code examples and implementations in all popular DL frameworks (PyTorch, Tensorflow, and MXNET)
It’s available online for free and constantly updated and involves all the newest material on Deep Learning.
If you’ve got the time, I definitely suggest reading this. I’m actually starting to read it for upgrading my coding knowledge. [4] Neural Networks and Deep Learning(Michael Nielsen) - Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data - Deep learning, a powerful set of techniques for learning in neural networks

Mathematics

[1] [Mathematics for Machine Learning] (https://mml-book.github.io/) Fast and efficient way

[2] MIT OCW Linear Algebra 18.06Ā YouTube Playlist There’s also this legendary course on Linear Algebra, taught by Prof. Gilbert Strang at MIT, and it’s publicly accessible. Well, I’d really recommend watching this course if you’re really into math and want to learn a whole lot more about linear algebra, and you’ve got the time too. It’s definitely more than enough for starting ML, but if you feel like learning more, go for it:Ā MIT OCW Linear Algebra 18.06Ā YouTube Playlist

[3] Probability and Statistics for Engineers and Scientistsā€ by Walpole, Mayers, Ye. More deep and academic way:
If you would like to dive deeper into the world of probability and statistics, I’d suggest the book

Surveys

[1]Ā LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. ā€œDeep learning.ā€ Nature 521.7553 (2015): 436-444.Ā [pdf]Ā (Three Giants’ Survey)

Lecture

[1] Machine Learning Specialization by Andrew Ng
This is probably the most popular ML course on the internet and A LOT of people have started their path into ML using it. It’s also the most popular and I guess the highest ranked course on Coursera (4.9/5). This Specialization is made of 3 courses covering the main parts of Machine Learning and by the end of it, you’ll have a good understanding of ML Algorithms and how to implement and use them in python.

[2] More deep, academic course: Stanford CS229 Machine Learning
This is the Machine Learning course taught at Stanford University, recorded in the class and uploaded on YouTube. Well, as I said before, I’m a fan of more deep academic courses and this isĀ THE COURSEĀ to go with if you’re like me. It involves a lot more math and details on ML concepts and algorithms and of course is more difficult to follow, but if you think you’ll be ok with the huge math and stuff and you won’t run away halfway through the course, don’t even hesitate to start with this one. The videos are uploaded online on YouTube and the course material is accessible from the course website. Two versions are available online, one from theĀ Autumn 2018 (Andrew Ng)Ā semester and one fromĀ Summer 2019 (Anand Avati). The first one is taught by Andrew Ng, the same instructor as the Coursera ML course introduced above, and the latter one is taught by Anand Avati, Andrew’s Ph.D. student. Choosing between the two is more a personal preference, I myself love Andrew’s way of teaching and I’m more comfortable with it.
Although, Anand Avati’s course is newer and covers more subjects. It even involves the math required for the course in the first three lectures.

[3] Easier to follow (Probably more popular): Deep Learning Specialization offered by DeepLearning.AI taught by Andrew Ng
This is a 5-course specialization, covering almost everything you need to understand Deep Learning and its ways.

[4] More deep, academic course: Stanford CS231n: Deep Learning for Computer Vision
I actually started Deep Learning with this course, and I’ve got to say, it’s THE BEST COURSE to start with if you’re ok to get a little deeper into the field like me. It is more focused on Deep Learning applications in Computer Vision, but it also covers ALL the basic and necessary aspects of Deep Learning too. So you should not worry about it being for Computer Vision at all. As a matter of fact, I watched the whole DL Specialization mentioned above too, after finishing this course, and I already knew all the stuff taught in the Specialization (and more) from this course. It even involves some Neural Network architectures mostly used in NLP. The only part of the Coursera Specialization that teaches more than this is the 5th course (Sequence Models) which is more focused on NLP. Its only drawback is that the available lecture videos are from the 2017 class, and it doesn’t cover some new topics like transformers. But if you’re interested enough, you’ll learn that new stuff on your own. (There’s also CS231n’s new semester’s course notes available which you can keep reading from those to learn the new methods too) After CS231n, I’d recommendĀ CS224nĀ if you’re interested in Natural Language Processing and want to get deep in that field.

Papers

So now is the paper.

This roadmap is constructed in accordance with the following four guidelines:

  • From outline to detail
  • From old to state-of-the-art
  • from generic to specific areas
  • focus on state-of-the-art

Milestone for 100 days

Basic DL Architecture

  1. Convolutional Neural Networks (CNNs)
    1. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. ā€œGradient-Based Learning Applied to Document Recognitionā€ (1998)
    2. Key Concepts: Convolutional layers, pooling layers, fully connected layers, and early applications to digit recognition (MNIST dataset).
  2. Recurrent Neural Networks (RNNs)
    1. Hochreiter, S., & Schmidhuber, J. ā€œLong Short-Term Memoryā€ (1997)
  3. AlexNet
    1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. ā€œImageNet Classification with Deep Convolutional Neural Networksā€ (2012)
    2. Key Concepts: Deeper networks, ReLU activation, dropout regularization, and large-scale image classification (ImageNet dataset).
  4. GoogLeNet(Inception)
    1. Szegedy, C., et al. ā€œGoing Deeper with Convolutionsā€ (2014)
    2. Key Concepts: Inception modules, dimensionality reduction, and efficiency in computation.
  5. VGGNet
    1. Simonyan, K., & Zisserman, A. ā€œVery Deep Convolutional Networks for Large-Scale Image Recognitionā€ (2014)
    2. Key Concepts: Simplicity in architecture with deeper layers, use of smaller 3x3 convolution filters.
  6. BN-Inception
    1. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)
  7. Inception-v2~ v3
    1. Rethinking the Inception Architecture for Computer Vision (2016)
  8. ResNet
    1. He, K., Zhang, X., Ren, S., & Sun, J. ā€œDeep Residual Learning for Image Recognitionā€ (2015)
    2. Key Concepts: Residual blocks, solving vanishing gradient problem, and very deep networks (e.g., ResNet-50, ResNet-101).
  9. DenseNet
    1. Densely Connected Convolutional Networks(2017)
    2. Key Concepts: Dense connections, feature reuse, and efficient gradient flow.
  10. Inception-v4
    1. Inception-ResNet and the Impact of Residual Connections on Learning(2016)
  11. GANs
    1. Goodfellow, I., et al. ā€œGenerative Adversarial Netsā€ (2014)
  12. Word2Vec
    1. Mikolov, T., et al. ā€œEfficient Estimation of Word Representations in Vector Spaceā€ (2013)
  13. Seq2Seq
    1. Sutskever, I., Vinyals, O., & Le, Q. V. ā€œSequence to Sequence Learning with Neural Networksā€ (2014)
  14. Attention Mechanism
    1. Bahdanau, D., Cho, K., & Bengio, Y. ā€œNeural Machine Translation by Jointly Learning to Align and Translateā€ (2014)
  15. Transformers
    1. Vaswani, A., et al. ā€œAttention is All You Needā€ (2017)
  16. BERT
    1. Devlin, J., et al. ā€œBERT: Pre-training of Deep Bidirectional Transformers for Language Understandingā€ (2018)
  17. GPT
    1. Radford, A., et al. ā€œImproving Language Understanding by Generative Pre-Trainingā€ (2018)
  18. EfficientNet
    1. Tan, M., & Le, Q. V. ā€œEfficientNet: Rethinking Model Scaling for Convolutional Neural Networksā€ (2019)
  19. MobileNet
    1. Howard, A. G., et al. ā€œMobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applicationsā€ (2017)
    2. Key Concepts: Depthwise separable convolutions, lightweight models for mobile and embedded vision applications.
  20. DALL-E - Ramesh, A., et al. ā€œZero-Shot Text-to-Image Generationā€ (2021)

Advanced DL Architecture

Foundation for Generative Models

  1. VAE [ICLR 2014]
  2. GAN [2014]
  3. Normalizing Flows [ICML 2015]
  4. Diffusion Models [NeurIPS 2020]

Transformers

  1. Attention Is All You Need [NeurIPS 2017]
  2. ViT [ICLR 2021]
  3. MLPmixer [2021])

Language Models

  1. Bert [NeurIPS 2018]
  2. GPT3 [2020]
  3. T5 [2021]
  4. GPT3.5 [2022]

Segmentation

  1. SAM [2023]
  2. DETR [ECCV 2020]
  3. OVSeg [2023]
  4. [TokenCutICLR 202023)

Image Generative Models

  1. StyleGAN2 [2020]
  2. StarGAN [CVPR 2018]
  3. Normalizing Flow [2019]
  4. PixelCNN [NeurIPS 2016]

Diffusion Models

  1. DDPM [CVPR 2018]
  2. LatentDiffusion [CVPR 2022]
  3. Distillation [ICLR 2022]
  4. SDE Explanation [ICLR 2021]
  5. Classfier-Free Guidance [NeurIPS 2021 Workshop]

Diffusion Models Manipulation

  1. DreamBooth [CVPR 2023]
  2. NullText-Inversion [CVPR 2023]
  3. Instructpix2pix [CVPR 2023]
  4. TextDeformer [SIGGRAPH 2023]

Neural Radiance Fields

  1. NeRF [CVPR 2021]
  2. TensoRF [ECCV 2022]
  3. Instant-NGP [SIGGRAPH 2022]
  4. 3D Gaussian [ACM Transactions On Graphics 2023]

3D Reconstruction

  1. ORB-SLAM2 [2017]
  2. Colmap [2016]
  3. Phototourism [ACM Transactions On Graphics 2006]Ā 
  4. Shape And Spatially-Varying Brdfs From Photometric Stereo

Implicit Representations

  1. DeepSDF [CVPR 2019]
  2. BACON [CVPR 2022]
  3. SIREN [NeurIPS 2020]
  4. AtlasNet [CVPR 2018]
  5. Occupancy Networks

3D Generative Models

  1. EG3D [CVPR 2022]
  2. DreamFusion [CVPR 2023]
  3. Get3D [NeurIPS 2022]

3D Scene Generation

  1. GenVS
  2. CC3D [ICCV 2023]
  3. 3DiM [ICLR 2023]
  4. LEGO-NET [CVPR 2023]
  5. Zero-123

SLAM

  1. DTAM [ICCV 2011]
  2. . DynamicFusion [CVPR 2015 ]
  3. KinectFusion
  4. LSD-SLAM [EECV 2014]

Dynamic Reconstruction

  1. HyperNeRF [SIGGRAPH Asia 2021]
  2. Dynamic 3D Gaussians
  3. Monocular Dynamic View Synthesis: A Reality Check [NeurIPS 2022]
  4. Neural Jacobian Fields [ACM Transactions On Graphics 2022]

Motion Generation

  1. Motion Diffusion Models [ICLR 2023 ]
  2. Synthesizing Physical Character-Scene Interactions
  3. Video Diffusion Models [2022]

Correspondences

  1. RAFT [ECCV 2020]
  2. PIPs [ECCV 2022]
  3. [SIFT [IJCV 2004]](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
  4. LoFTR [ CVPR 2021]
  5. DKM [CVPR 2023]

Learning from Videos

  1. SlowFast Networks [CVPR 2019]
  2. Object Landmarks [NeurIPS 2018]
  3. Watching Frozen People [CVPR 2019]
  4. AMD [NeurIPS 2021]

Internal Learning

  1. SinGAN [ICCV 2019]
  2. Drop The GAN [CVPR 2022]
  3. Zero Shot Super-Resolution [CVPR 2018]
  4. Learn From A Single Image [ICLR 2020]
  5. SinGRAF [CVPR 2023]
  6. SinFusion [ICML 2023])

Category and Pose

  1. Congealing
  2. Category-Viewpoint Combinations [ICLR 2021]
  3. Neural Congealing
  4. Correspondence From Image Diffusion [2023]

Parts and Wholes

  1. Detect What You Can [CVPR 2014]
  2. Attentional Constellation Nets [ICLR 2021]
  3. Semantic Understanding Of Scenes [IJCV 2018]
  4. Hedging Your Bets [CVPR 2012]

Fine-grained Recognition

  1. Between-Class Attribute Transfer [CVPR 2009]
  2. Attributes As Operators [ECCV 2018]
  3. Semantic Output Codes [NeurIPS 2009]
  4. INaturalist [CVPR 2018]

Few-shot Learning

  1. Prototypical Networks [NeurIPS 2017]
  2. Flamingo [NeurIPS 2022]
  3. Matching Net [NeurIPS 2016]
  4. Conditional Prompt Learning For Vision-Language Models [CVPR 2022]

Continual Learning

  1. Overcoming Catastrophic Forgetting [PNAS 2017]
  2. Robust Fine-Tuning Of Zero-Shot Models [CVPR 2022]
  3. Variational Continual Learning [ICLR 2018]
  4. Gradient Projection Memory [ICLR 2021]

Representation Learning

  1. CPC [NeurIPS 2018]
  2. Continual Learners [CVPR 2022]
  3. NPID [CVPR 2018]
  4. BYOL [NeurIPS 2020]