Research

Google scholar profile

2017

  • Neural Episodic Control.
    Pritzel Alexander, Uria Benigno, Srinivasan Sriram, Puigdomenech Adria, Vinyals Oriol, Hassabis Demis, Wierstra Daan, Blundell Charles.
    Proceedings of the 34rd International Conference on Machine Learning (ICML 2017).
    ▼ Show abstract PDF bib

    Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

  • Comparison of Maximum Likelihood and GAN-based training of Real NVPs.
    Danihelka Ivo, Lakshminarayanan Balaji, Uria Benigno, Wierstra Daan, Dayan Peter.
    arXiv preprint.
    ▼ Show abstract PDF bib

    We train a generator by maximum likelihood and we also train the same generator architecture by Wasserstein GAN. We then compare the generated samples, exact log-probability densities and approximate Wasserstein distances. We show that an independent critic trained to approximate Wasserstein distance between the validation set and the generator distribution helps detect overfitting. Finally, we use ideas from the one-shot learning literature to develop a novel fast learning critic.

2016

  • Early Visual Concept Learning with Unsupervised Deep Learning.
    Higgins Irina, Matthey Loic, Glorot Xavier, Pal Arka, Uria Benigno, Blundell Charles, Mohamed Shakir, Lerchner Alexander.
    arXiv preprint.
    ▼ Show abstract PDF bib

    State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.

  • Model-Free Episodic Control.
    Blundell Charles, Uria Benigno, Pritzel Alexander, Li Yazhe, Ruderman Avraham, Leibo Joel, Rae Jack, Wierstra Daan, Hassabis Demis.
    arXiv preprint.
    ▼ Show abstract PDF bib

    State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.

  • Associative long short-term memory.
    Danihelka Ivo, Wayne Greg, Uria Benigno, Kalchbrenner Nal, Graves Alex.
    Proceedings of the 33rd International Conference on Machine Learning (ICML 2016).
    ▼ Show abstract PDF bib

    We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval be- comes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.

  • Neural autoregressive distribution estimation.
    Uria Benigno, Cote Marc-Alexandre, Gregor Karol, Murray Iain, Larochelle Hugo.
    Journal of Machine Learning Research.
    ▼ Show abstract PDF bib

    We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.

2015

  • Connectionist multivariate density-estimation and its application to speech synthesis.
    Uria Benigno.
    The University of Edinburgh, School of Informatics, PhD dissertation (2015).
    ▼ Show abstract PDF bib

    Autoregressive models factorize a multivariate joint probability distribution into a product of one-dimensional conditional distributions. The variables are assigned an ordering, and the conditional distribution of each variable modelled using all variables preceding it in that ordering as predictors.

    Calculating normalized probabilities and sampling has polynomial computational complexity under autoregressive models. Moreover, binary autoregressive models based on neural networks obtain statistical performances similar to that of some intractable models, like restricted Boltzmann machines, on several datasets.

    The use of autoregressive probability density estimators based on neural networks to model real-valued data, while proposed before, has never been properly investigated and reported. In this thesis we extend the formulation of neural autoregressive distribution estimators (NADE) to real-valued data; a model we call the real-valued neural autoregressive density estimator (RNADE). Its statistical performance on several datasets, including visual and auditory data, is reported and compared to that of other models. RNADE obtained higher test likelihoods than other tractable models, while retaining all the attractive computational properties of autoregressive models.

    However, autoregressive models are limited by the ordering of the variables inherent to their formulation. Marginalization and imputation tasks can only be solved analytically if the missing variables are at the end of the ordering. We present a new training technique that obtains a set of parameters that can be used for any ordering of the variables. By choosing a model with a convenient ordering of the dimensions at test time, it is possible to solve any marginalization and imputation tasks analytically.

    The same training procedure also makes it practical to train NADEs and RNADEs with several hidden layers. The resulting deep and tractable models display higher test likelihoods than the equivalent one-hidden-layer models for all the datasets tested.

    Ensembles of NADEs or RNADEs can be created inexpensively by combining models that share their parameters but differ in the ordering of the variables. These ensembles of autoregressive models obtain state-of-the-art statistical performances for several datasets.

    Finally, we demonstrate the application of RNADE to speech synthesis, and confirm that capturing the phone-conditional dependencies of acoustic features improves the quality of synthetic speech. Our model generates synthetic speech that was judged by naive listeners as being of higher quality than that generated by mixture density networks, which are considered a state-of-the-art synthesis technique.

  • Modelling Acoustic-Feature Dependencies with Artificial Neural-Networks: Trajectory-RNADE.
    Uria Benigno, Murray Iain, Renals Steve, Valentini-Botinhao Cassia, Bridle John.
    IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP-2015.
    ▼ Show abstract Generated samples PDF bib

    Given a transcription, sampling from a good model of acoustic feature trajectories should result in plausible realizations of an utterance. However, samples from current probabilistic speech synthesis systems result in low quality synthetic speech. Henter et al. have demonstrated the need to capture the dependencies between acoustic features conditioned on the phonetic labels in order to obtain high quality synthetic speech. These dependencies are often ignored in neural network based acoustic models. We tackle this deficiency by introducing a probabilistic neural network model of acoustic trajectories, trajectory RNADE, able to capture these dependencies.

2014

  • A Deep and Tractable Density Estimator.
    Uria Benigno, Murray Iain, Larochelle Hugo.
    Proceedings of the 31th International Conference on Machine Learning, (ICML 2014).
    ▼ Show abstract Code PDF bib

    The Neural Autoregressive Distribution Estimator (NADE) and its real-valued version RNADE are competitive density models of multidimensional data across a variety of domains. These models use a fixed, arbitrary ordering of the data dimensions. One can easily condition on variables at the beginning of the ordering, and marginalize out variables at the end of the ordering, however other inference tasks require approximate inference. In this work we introduce an efficient procedure to simultaneously train a NADE model for each possible ordering of the variables, by sharing parameters across all these models. We can thus use the most convenient model for each inference task at hand, and ensembles of such models with different orderings are immediately available. Moreover, unlike the original NADE, our training procedure scales to deep models. Empirically, ensembles of Deep NADE models obtain state of the art density estimation performance.

2013

  • RNADE: The real-valued neural autoregressive density-estimator.
    Uria Benigno, Murray Iain, Larochelle Hugo.
    Advances in Neural Information Processing Systems 26 (NIPS 2013)
    ▼ Show abstract Supplement Code PDF bib

    We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case.

2012

  • Deep Architectures for Articulatory Inversion.
    Uria Benigno, Murray Iain, Renals Steve, Richmond Korin.
    Proc. Interspeech, Portland, Oregon, USA
    ▼ Show abstract PDF bib

    We implement two deep architectures for the acoustic-articulatory inversion mapping problem: a deep neural network and a deep trajectory mixture density network. We find that in both cases, deep architectures produce more accurate predictions than shallow architectures and that this is due to the higher expressive capability of a deep model and not a consequence of adding more adjustable parameters. We also find that a deep trajectory mixture density network is able to obtain better inversion accuracies than smoothing the results of a deep neural network. Our best model obtained an average root mean square error of 0.885 mm on the MNGU0 test dataset.

2011

  • A deep belief network for the acoustic-articulatory inversion mapping problem.
    Uria Benigno.
    The University of Edinburgh, School of Informatics, MSc dissertation
    ▼ Show abstract PDF bib

    In this work, we implement a deep belief network for the acoustic-articulatory inversion mapping problem.
    We find that adding up to 3 hidden-layers improves inversion accuracy. We also show this is due to the higher expressive capability of a deep model and not a consequence of adding more adjustable parameters. Besides, we show unsupervised pretraining of the system improves its performance in all cases, even for a 1 hidden-layer model. Our implementation obtained an average root mean square error of 0.95 mm on the MNGU0 test dataset, beating all previously published results.