Levine, Sergey. “Reinforcement learning and control as probabilistic inference: Tutorial and review.” arXiv preprint arXiv:1805.00909 (2018).
Lee, Lisa, et al. “Efficient exploration via state marginal matching.” arXiv preprint arXiv:1906.05274 (2019).
Haarnoja, Tuomas, et al. “Soft actor-critic algorithms and applications.” arXiv preprint arXiv:1812.05905 (2018).
Fujimoto, Scott, Herke Van Hoof, and David Meger. “Addressing function approximation error in actor-critic methods.” arXiv preprint arXiv:1802.09477 (2018).
Silver, David, et al. “Deterministic policy gradient algorithms.” 2014.