Dai, Zihang, et al. “Transformer-xl: Attentive language models beyond a fixed-length context.” arXiv preprint arXiv:1901.02860 (2019).
Zhou, Allan, Tom Knowles, and Chelsea Finn. “Meta-learning symmetries by reparameterization.” arXiv preprint arXiv:2007.02933 (2020).
BWilson, James T., et al. “Efficiently sampling functions from Gaussian process posteriors.” arXiv preprint arXiv:2002.09309 (2020).
Baevski, Alexei, Steffen Schneider, and Michael Auli. “vq-wav2vec: Self-supervised learning of discrete speech representations.” arXiv preprint arXiv:1910.05...