🧐課後功課答案 11.5. Self-Attention and Positional Encoding
00 min
2024-6-27
2024-6-27
type
status
date
slug
password
summary
tags
category
icon

1. Suppose that we design a deep architecture to represent a sequence by stacking self-attention layers with positional encoding. What could the possible issues be?

  1. 計算複雜度過高:我們知道自注意力機制的計算複雜度與序列長度呈二次關係,隨著序列越長,計算量會急劇上升。在實際應用中,如果序列過長,會導致計算效率低下
  1. 訓練困難:由於模型深度較大,訓練深度自注意力網絡可能會面臨梯度消失/爆炸等問題。模型的收敛速度和性能也可能會受到影響。泛化能力:過於複雜的模型架構可能會導致模型過擬合,無法很好地泛化到新的數據。降低性能。
  1. 解釋性較差
  1. 參數冗余:堆疊過多的自注意力層

2.Can you design a learnable positional encoding method?

Learnable positional encoding method

Let be the input sequence of length , and be the dimension of the input embeddings. The learnable positional encoding can be defined as:
where and are learnable parameters, and represents the position in the sequence.
上一篇
Deep learning Guide 11: The Transformer Architecture
下一篇
課後功課答案 11.4. Multi-Head Attention

Comments
Loading...