[Misc] Scraps from the GEMMA on FPGA Project
· 2 min read
Some discarded byproducts from the GEMMA on FPGA project...
Some discarded byproducts from the GEMMA on FPGA project...
Since rapid coding experiments are so trendy these days, I've been trying them out here and there. But after actually using them, I definitely feel that automated code output still has a lot of shortcomings. Relying entirely on model output for all the logic is definitely not the way to go.
This text contains the core concepts and mathematical principles of the Transformer model architecture.
This document is a note organizing the architecture and training process of the GPT-1 paper by combining mathematical definitions with intuitive interpretations.
The book that helped me most when I first started learning CUDA programming.