ch 08 Modern Convolutional Neural Networks

Modern CNN

AlexNet
VGG network
NiN(Network in Network)
GoogLeNet
ResNet
DenseNet

AlexNet(Deep Convolutional Neural Networks)

LeNet

they did not immediately dominate the field
good results on early small dataset
- larger, more realistic dataset had yet to be established
they were not yet sufficiently powerful to make deep multichannel, multilayer
conception(not yet)
- parameter initialization
- gradient descent
- regularization
End-toEnd X → classical pippelines
- intersting dataset
- preprocess the dataset
- set(standard set of feature extractors) dataset
- resulting representations → favorite classifier

AlexNet

8-layer CNN(CONV_5 + MaxPooling_3) + 2FC + 1output(FC)

Untitled

much deeper than the comparatively small LeNet-5
ReLu instead of sigmoid as its activation function
1. vanishing gradient 문제 해결
Convolutional Block
1. 11×11 strides=4 Conv
  1. larger convolution window is needed to capture the object
2. 3×3 strides=2 MaxPool
3. 5×5 padding=2 Conv
4. 3×3 stride=2 MaxPool
5. 3×3 paddding=1 Conv 레이어 3개
6. 3×3 stride=2 MaxPool
adds max pooling layers
10-times more convolution channels than LeNet
two huge fully connected layer with 4096 outputs
1. two GPU
2. nowadays we rarely need to break up models
dropout ↔ LeNet uses weight decay
1. 과적합 방지
data augmentation

VGG

Networks Using Blocks

block → repeating patterns of layers
- conv layer + maxpooling layer
visual geometry group(VGG)

Untitled

basic building block of CNN
- a convolutional layer with padding to maintain the resolution
- nonlinearity → ReLu
- Pooling layer → max-pooling
  - to reduce the resolution
→ spatial resolution decrease quite rapidly
VGG → use multiple convolutions in between downsampling via max-pooling in the form of a block
the successive application of two 3_3 convolutions touches the same pixels as a single 5_5 convolution does

Network in Network

LeNet, AlexNet VGG → a common design pattern
- extract features exploiting spatial structure via a sequence of convolutions and pooling layers and post-process the representations via fully connected layers
two major challenges
- the fully connected layers at the end of the architecture
  - consume tremendous numbers of parameters
- impossible to add fully connected layers earlier in the network to increase the degree of nonlinearity
NiN Network in Network
- use 1*1 convolutions to add local nonlinearities across the channel activations
- use global average pooling to integrate across all locations in the last representation layer

Untitled

구조를 보면 NiN은 AlexNet에 착안하여 만들어졌기에 conv window의 size가 AlexNet과 그 쌍이 같음을 알 수 있다. 하지만 둘 사이의 큰 차이가 존재하는데 NiN은 fc layer을 사용하지 않았고 대신에 NiN block에서 output channel의 갯수가 label class의 갯수와 같게 설정하였고 global average pooling layer가 존재한다. 이러한 NiN 구조의 장점은 오버피팅의 가능성이 줄고 모델 파라미터 갯수가 훨씬 줄어들었다는 점이다.

GoogLeNet

Nin + bock 구조

Multi Branch Network

steam (data ingest)
- given by the first two or three convolutions that operate on the image
- extract low-level features from the underlying images
body (data processing)
- followed by a body of convolutional blocks
head (prediction)
- maps the features obtained so far to the required classification, segmentation, detection, or tracking problem at hand

Untitled

1*1
1_1 → padding =1 3_3
1_1 → padding =2 5_5
padding1 3_3 Maxpool → 1_1

Screenshot 2024-02-07 at 1.23.00 PM.png

Screenshot 2024-02-07 at 1.23.48 PM.png

7*7 CONV
3*3 MaxPool
1*1 CONV
3*3 CONV
3*3 MaxPool
Inception * 2
3*3 MaxPool
Inception *5
3*3 Maxpool
Inception *2
Global AvgPool
FC

Batch Normalization

Batch Norm VS Layer Norm

각 채널 단위로 정규화
각 관측치 단위로 정규화

ResNet & ResNeXt

Gradient Vanishing problem

Function Class

Residual Block

Screenshot 2024-02-07 at 1.30.14 PM.png

Input : x
function : f(x)
왼쪽의 기존모델은 블록의 출력값이 바로 f(x) 인데 반해 오른쪽 모델은 합성곱 연산을 통해 얻은 결과 g(x) 에 기존 입력값 x 를 더한 f(x)=g(x)+x 를 블록의 출력값으로 하고 있다.
이는 기존의 학습한 정보_x_ 를 보존하고, 거기에 추가로 학습_g_(x) 을 하게 되는 방식으로 이해할 수 있다. 이 g(x) 를 잔차 residual 이라고 부른다
이는 레이어가 깊어져 많이 학습될수록 x 는 점점 출력값 f(x) 에 가까워져 추가학습량 g(x)=f(x)−_x_→0 이 되어야 한다는 의미이다.따라서 학습의 목표는 g(x)=f(x)−_x_→0 로, residual을 0으로 가깝게 만드는 것이 목표가 된다
이 방법은 역전파하게 되었을 때 f(x) 를 미분하게 된다. 이는 g(x)+x 를 미분하는 것인데, 이때 아무리 미분을 해도 1 은 남기 때문에 기울기 소실 문제를 예방할 수 있다

Screenshot 2024-02-07 at 1.30.37 PM.png

첫번째에서 2개의 레이어는 GoogLeNet과 동일한 구조이다. 다만 중간사이 배치 정규화가 있다는 차이점은 존재한다
그 다음부터는 GoogLeNet이 4개의 인셉션 모듈을 사용한것과는 달리, ResNet은 residual block을 사용했다는 점에서 차이가 있다.

ResNeXt

One of the challenges one encounters in the design of ResNet is the trade-off between nonlinearity and dimensionality within a given block.

Screenshot 2024-02-07 at 1.30.55 PM.png

DenseNet

Screenshot 2024-02-07 at 1.33.50 PM.png

ResNet에 대해 다시 살펴보면 아래 구조와 같이 liinear한 x와 nonlinear한 g(x)를 더한, f(x) = x + g(x) 로 하나의 block의 연산이 이뤄진다. 이 때 DenseNet은 두 term을 더하는 것이 아닌 concatenate한다는 것에서 차이점이 있다.

🪴 Jihee's Blog

Explorer

ch 08 Modern Convolutional Neural Networks

AlexNet(Deep Convolutional Neural Networks)

LeNet

AlexNet

VGG

Networks Using Blocks

Network in Network

GoogLeNet

Multi Branch Network

Batch Normalization

ResNet & ResNeXt

Function Class

Residual Block

ResNeXt

DenseNet

Graph View

Table of Contents

Backlinks