Information for prospective applicants

For the projects below, all scientific backgrounds will be considered.

However, most (if not all) of the projects I propose involve some analysis and modeling of astronomical data, with a focus on either cosmology or Galactic astrophysics. Therefore, some experience in scientific programming and data analysis is very strongly recommended (for example in python, which is currently the language of choice in astrophysics). No specific experience is required in astronomy, cosmology, or theoretical physics, or even statistics or machine learning. But some experience or interest in at least some of those fields is strongly recommended.

Please also carefully read this page on what to expect when working with me.

PhD students

Do not hesitate to reach out if you would like to discuss projects and opportunities. However, for more efficient discussions please first research the PhD admission process in the Astrophysics group, as well as possible funding sources (STFC, President’s scholarship, etc). I am not describing specific projects here, but it will most likely be in observational cosmology, and involve analysis of real data with sophisticated statistical techniques theoretical modelling. For 2021 the list of available projects in the department is available here.

MSc/MSci students

Every year I offer projects that last between 3 and 9 months, depending on the degree, and can be carried out alone or in pairs (but you will be responsible for splitting the work). All degrees are welcome. Typically, I expect the projects to attract data-minded students in the following MSc/MSci programs: Physics, Data Science, Computer Science, Artificial Intelligence, Electrical Engineering, etc. For the requirements, see the message on top of this page.

List of possible projects (last update Jan 2021):

Some of the projects below are complementary, so that we may have more vibrant and collaborative weekly group meetings, and students may help each other more, both directly and indirectly.

  • Systematics in galaxy clustering: implement the systematics model of this paper, to learn a flexible relationship between the observed sky density of galaxies and nuisance observational systematics (Galactic foregrounds, atmospheric conditions, number of exposures, etc). Such density-systematics correlations are major obstacles in cosmological analyses and must be removed. This project will focus on implementing the simplest multivariate linear model with Self-Organised Maps (and potentially other unsupervised machine learning methods), possibly on the same data (KIDS DR4, or any other easily accessible public cosmology surveys) but not necessarily go all the way to final diagnostics with correlations functions and “random” synthetic catalogs (if anything, we would use angular power spectra instead).
  • Injecting simulated galaxies in photometric images: review and play with one or multiple codes to perform galaxy injections and extractions (for example the public DECALS or LSST pipelines; this review paper is a good start). Injection simulations are critical for accurately characterizing the image analysis pipelines that measure the properties of millions of galaxies (for testing models of cosmology and galaxy formation). This project would include a brief review of photometric image simulations: how synthetic galaxies are simulated (morphologies, colors, etc), how they are injected in photometric images (real or also synthetic), and how the galaxy properties of interest are measured/recovered by a real analysis pipeline in the presence of noise, artefacts, bright stars, other nearby galaxies, etc.
  • Measuring the acceleration of the solar system with Gaia EDR3 data: repeat the analysis of this paper as closely as possible, and compare with a more conventional Bayesian version of the inference.
  • Quasars in Gaia EDR3 data: reproduce the most recent Gaia quasar sample (EDR3 paper to come soon; in the meantime the DR2 paper is a good start), investigate the level of stellar contamination, and construct a new catalog of quasar candidates that is not based on cross-matching (for example using a machine learning method or a simple density model, potentially following this paper).
  • Testing synthetic stellar population models on Lyman Break galaxy spectra: applying the FSPS and Prospector codes to spectroscopy from a few dropout / Lyman Break galaxies (possibly from that data) to extract physical properties of interest and compare with other methods.
  • Normalizing flows on quasars, stars, galaxies: reproduce some of the results of this paper (classifying galaxies, stars, and quasars from the Gaia data) using more sophisticated Bayesian machine learning techniques, such as normalising flows.
  • Comparison of Bayesian neural network methods: review Bayesian neural network approaches (see this recent review), in particular deep ensembles and the recent work on neural tangent kernels. Apply to some simple astronomical problems of interest.
  • Solving astrophysics PDEs with machine learning: use the Fourier Neural Operator (a revolutionary machine learning technique for solving PDEs) on simple applications in astrophysics and cosmology, to study its applicability.


Our group can support your application for externally-funded fellowship with us. Examples include: Imperial JRF, Royal Society URF, STFC JRF, etc. Simply contact any of the staff members (including myself).