🧐課後功課答案 11.4. Multi-Head Attention
00 min
2024-6-26
2024-6-27
type
status
date
slug
password
summary
tags
category
icon

1. Visualize attention weights of multiple heads in this experiment.

2. Suppose that we have a trained model based on multi-head attention and we want to prune less important attention heads to increase the prediction speed. How can we design experiments to measure the importance of an attention head?

上一篇
課後功課答案 11.5. Self-Attention and Positional Encoding
下一篇
Deep learning Guide 10: Multi-Head Attention, Self-Attention and Positional Encoding

Comments
Loading...