The MAMBA design transformer which has a language modeling head on top rated (linear layer with weights tied to the input
An enormous system of investigate has appeared on more productive variants of focus to beat https://k2spiceshop.com/product/liquid-k2-on-paper-online/