Days 1-10:

AlexNet
VGGNet
GoogLeNet (Inception)
ResNet
DenseNet

Days 11-20:

SqueezeNet
MobileNet
MobileNetV2
ShuffleNet
EfficientNet

Days 21-30:

Capsule Networks
NASNet
Mask R-CNN
YOLO (You Only Look Once)
FCN (Fully Convolutional Networks)

Days 31-40:

U-Net
WaveNet
Tacotron
BERT (Bidirectional Encoder Representations from Transformers)
GPT (Generative Pre-trained Transformer)

Days 41-50:

Transformer-XL
RoBERTa
ALBERT
T5 (Text-To-Text Transfer Transformer)
Vision Transformer (ViT)

Days 51-60:

DeiT (Data-efficient Image Transformer)
ResNeSt
Swin Transformer
DALL-E
CLIP

Days 61-70:

BigGAN
StyleGAN
PointNet
PointNet++
Graph Convolutional Networks (GCNs)

Days 71-80:

Graph Attention Networks (GAT)
DeepMind’s MuZero
OpenAI’s CLIP
NVIDIA’s Megatron
Microsoft’s Turing-NLG

Days 81-90:

GPT-3 (Generative Pre-trained Transformer 3)
ViT-GAN (Vision Transformer GAN)
LSTM (Long Short-Term Memory)
GRU (Gated Recurrent Unit)
NTM (Neural Turing Machine)

Days 91-100:

SRGAN (Super-Resolution Generative Adversarial Network)
CycleGAN
DCGAN (Deep Convolutional Generative Adversarial Network)
WGAN (Wasserstein Generative Adversarial Network)
Pix2Pix

Days 101-110:

PPO (Proximal Policy Optimization)
A3C (Asynchronous Advantage Actor-Critic)
DDPG (Deep Deterministic Policy Gradient)
SAC (Soft Actor-Critic)
TRPO (Trust Region Policy Optimization)

Days 111-120:

AlphaGo
AlphaZero
AlphaFold
DALL-E 2
CLIP 2

Days 121-130:

BERT 2
GPT-4

DeepLearning

Embarking on a journey through state-of-the-art (SOTA) deep learning models can be both exciting and challenging. Here’s a roadmap that includes key papers and advancements in deep learning up until 2024. This roadmap is structured chronologically and by subfields, providing a well-rounded understanding of various deep learning paradigms.

1. Foundational Papers (Pre-2012)

Artificial Neural Networks:
- McCulloch, W. S., & Pitts, W. (1943). “A logical calculus of the ideas immanent in nervous activity.”
- Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.”
Backpropagation:
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). “Learning representations by back-propagating errors.”

2. Deep Learning Renaissance (2012-2014)

AlexNet:
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.”
VGGNet:
- Simonyan, K., & Zisserman, A. (2014). “Very Deep Convolutional Networks for Large-Scale Image Recognition.”
GoogLeNet/Inception:
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … & Rabinovich, A. (2015). “Going Deeper with Convolutions.”

3. Advancements in CNN Architectures (2015-2016)

ResNet:
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). “Deep Residual Learning for Image Recognition.”
DenseNet:
- Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). “Densely Connected Convolutional Networks.”

4. Recurrent Neural Networks and Attention Mechanisms (2014-2017)

Sequence to Sequence:
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). “Sequence to Sequence Learning with Neural Networks.”
Attention is All You Need:
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). “Attention is All You Need.”

5. Generative Models and Adversarial Networks (2014-2018)

GANs:
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). “Generative Adversarial Nets.”
Wasserstein GAN:
- Arjovsky, M., Chintala, S., & Bottou, L. (2017). “Wasserstein GAN.”
VAE:
- Kingma, D. P., & Welling, M. (2013). “Auto-Encoding Variational Bayes.”

6. Reinforcement Learning and Deep RL (2013-2019)

DQN:
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … & Hassabis, D. (2015). “Human-level control through deep reinforcement learning.”
AlphaGo:
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … & Hassabis, D. (2016). “Mastering the game of Go with deep neural networks and tree search.”

7. Transformers and NLP Revolution (2018-2020)

BERT:
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.”
GPT-2 and GPT-3:
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). “Language Models are Unsupervised Multitask Learners.”
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). “Language Models are Few-Shot Learners.”

8. Vision Transformers and Advanced Architectures (2020-2022)

Vision Transformer (ViT):
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … & Houlsby, N. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.”
Swin Transformer:
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., … & Guo, B. (2021). “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.”

9. Recent Trends and SOTA (2022-2024)

Large Language Models (LLMs):
- OpenAI (2023). “GPT-4 Technical Report.”
- Google Research (2022). “PaLM: Scaling Language Modeling with Pathways.”
Diffusion Models:
- Ho, J., Jain, A., & Abbeel, P. (2020). “Denoising Diffusion Probabilistic Models.”
- Song, Y., & Ermon, S. (2020). “Score-Based Generative Modeling through Stochastic Differential Equations.”
NeRF (Neural Radiance Fields):
- Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.”

10. Special Topics and Applications (2020-2024)

Graph Neural Networks (GNNs):
- Kipf, T. N., & Welling, M. (2016). “Semi-Supervised Classification with Graph Convolutional Networks.”
- Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). “Graph Attention Networks.”
Prompt Engineering:
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.”
Federated Learning:
- McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). “Communication-Efficient Learning of Deep Networks from Decentralized Data.”

Study Tips:

Read Key Papers: Focus on understanding the methodology, experiments, and results.
Implement Models: Use frameworks like TensorFlow or PyTorch to replicate key models.
Explore Datasets: Practice with benchmark datasets (e.g., ImageNet, COCO, GLUE).
Follow Conferences: Keep up with recent publications from top conferences (NeurIPS, ICML, CVPR, ACL).
Engage with the Community: Participate in forums, workshops, and open-source projects.

By following this roadmap, you’ll gain a comprehensive understanding of the evolution and current state of deep learning, equipping you with the knowledge to contribute to future advancements in the field.

recent trends, special topics, and applications in deep learning from 2020 to 2024

Sure! Here are some recent trends, special topics, and applications in deep learning from 2020 to 2024, with key papers highlighted for each area:

1. Large Language Models (LLMs)

GPT-4:
- OpenAI (2023). “GPT-4 Technical Report.”
PaLM:
- Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., … & Dean, J. (2022). “PaLM: Scaling Language Modeling with Pathways.”

2. Diffusion Models

Denoising Diffusion Probabilistic Models:
- Ho, J., Jain, A., & Abbeel, P. (2020). “Denoising Diffusion Probabilistic Models.”
Score-Based Generative Modeling:
- Song, Y., & Ermon, S. (2020). “Score-Based Generative Modeling through Stochastic Differential Equations.”

3. Neural Radiance Fields (NeRF)

NeRF:
- Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.”
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding:
- Müller, T., Evans, A., Schied, C., & Keller, A. (2022). “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding.”

4. Graph Neural Networks (GNNs)

Graph Convolutional Networks:
- Kipf, T. N., & Welling, M. (2016). “Semi-Supervised Classification with Graph Convolutional Networks.”
Graph Attention Networks:
- Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). “Graph Attention Networks.”

5. Prompt Engineering

Prompting Methods Survey:
- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). “Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.”
AutoPrompt:
- Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). “AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts.”

6. Federated Learning

Federated Learning Communication:
- McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). “Communication-Efficient Learning of Deep Networks from Decentralized Data.”
Federated Averaging:
- Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., … & Zhao, S. (2019). “Advances and Open Problems in Federated Learning.”

7. Self-Supervised Learning

SimCLR:
- Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). “A Simple Framework for Contrastive Learning of Visual Representations.”
BYOL:
- Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., … & Valko, M. (2020). “Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning.”

8. Transformers in Vision

Vision Transformer (ViT):
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … & Houlsby, N. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.”
Swin Transformer:
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., … & Guo, B. (2021). “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.”

9. Meta-Learning

MAML:
- Finn, C., Abbeel, P., & Levine, S. (2017). “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.”
LEO:
- Rusu, A. A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., & Hadsell, R. (2019). “Meta-Learning with Latent Embedding Optimization.”

10. Robustness and Fairness

Adversarial Training:
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). “Towards Deep Learning Models Resistant to Adversarial Attacks.”
Algorithmic Fairness:
- Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). “A Survey on Bias and Fairness in Machine Learning.”

Study and Implementation Tips:

Implement and Experiment: Utilize frameworks like TensorFlow and PyTorch to implement models from these papers and experiment with variations.
Datasets: Practice with standard datasets (e.g., CIFAR-10/100, ImageNet, COCO, GLUE) and domain-specific datasets for targeted applications.
Follow Conferences: Stay updated with papers from leading conferences (NeurIPS, ICML, CVPR, ICLR, ACL).
Community Engagement: Participate in workshops, forums, and open-source projects to engage with the research community.

By focusing on these recent trends and special topics, you’ll gain a comprehensive understanding of cutting-edge research and applications in deep learning.

Multimodal deep learning is an exciting and rapidly evolving field that focuses on models capable of processing and integrating information from multiple modalities such as text, images, audio, and video. Here are some key papers and advancements in multimodal deep learning up until 2024:

Foundational and Key Papers in Multimodal Learning

Early Multimodal Learning

Deep Visual-Semantic Alignments:
- Karpathy, A., & Fei-Fei, L. (2015). “Deep Visual-Semantic Alignments for Generating Image Descriptions.”
Show and Tell:
- Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). “Show and Tell: A Neural Image Caption Generator.”

Multimodal Representation Learning

DeViSE:
- Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., … & Norouzi, M. (2013). “DeViSE: A Deep Visual-Semantic Embedding Model.”
VQA:
- Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). “VQA: Visual Question Answering.”

Recent Advances in Multimodal Learning (2020-2024)

Vision and Language Models

ViLBERT:
- Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). “ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks.”
LXMERT:
- Tan, H., & Bansal, M. (2019). “LXMERT: Learning Cross-Modality Encoder Representations from Transformers.”
VisualBERT:
- Li, L. H., Yatskar, M., Yin, D., Hsieh, C. J., & Chang, K. W. (2019). “VisualBERT: A Simple and Performant Baseline for Vision and Language.”

Unified Multimodal Models

CLIP:
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021). “Learning Transferable Visual Models From Natural Language Supervision.”
ALIGN:
- Jia, C., Yang, Y., Xia, Y., Chen, Y. T., Parekh, Z., Pham, H., … & Le, Q. V. (2021). “Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision.”

Multimodal Transformers

M3P:
- Ni, J., Li, J., Liang, Y., Wei, F., Zhang, M., Yang, L., … & Zhou, M. (2021). “M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training.”
FLAVA:
- Singh, A., Goswami, V., & Farhadi, A. (2021). “FLAVA: A Foundational Language and Vision Alignment Model.”

Advanced Applications

DALL-E:
- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2021). “Zero-Shot Text-to-Image Generation.”
VATT (Video-Audio-Text Transformer):
- Akbari, H., Yuan, L., Qian, R., Chuang, W. H., Chang, S. F., Cui, Y., & Gong, B. (2021). “VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text.”
Perceiver IO:
- Jaegle, A., Borgeaud, S., Alayrac, J. B., Doersch, C., Ionescu, C., Ding, D., … & Carreira, J. (2021). “Perceiver IO: A General Architecture for Structured Inputs & Outputs.”

Cutting-Edge Multimodal Papers (2023-2024)

Flamingo:
- Alayrac, J. B., Recasens, A., Schneider, R., Arandjelovic, R., Ramapuram, J., Hasson, Y., … & Carreira, J. (2022). “Flamingo: A Visual Language Model for Few-Shot Learning.”
Gato:
- Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S. G., Novikov, A., Rae, J. W., … & de Freitas, N. (2022). “A Generalist Agent.”

Study and Implementation Tips for Multimodal Learning:

Implement Models: Use frameworks like PyTorch and TensorFlow to implement key multimodal models. Libraries such as Hugging Face’s Transformers can be very helpful.
Datasets: Utilize benchmark datasets such as MS COCO, Visual Genome, VQA, and others for training and evaluating models.
Read Tutorials and Blogs: Stay updated with tutorials, blog posts, and GitHub repositories that explain recent advancements and provide implementation details.
Participate in Competitions: Engage in competitions on platforms like Kaggle or academic challenges related to multimodal learning tasks.
Follow Conferences: Keep up with papers from top conferences such as CVPR, ICCV, ACL, NeurIPS, and ICLR to stay updated with the latest research.

By focusing on these recent and foundational papers, you’ll gain a comprehensive understanding of the state-of-the-art techniques and models in multimodal deep learning.

🪴 Jihee's Blog

Explorer

100 days