Project proposal details

Please look carefully through the proposal details below. If you are interested in the project then contact the supervisor, explaining why you are interested and any background which makes you a good fit for the project.

If this is an external project, the lead supervisor may have suggested someone at Imperial College or the NHM who could act as an internal supervisor and you should also contact them. If the project is external and no internal has been proposed then you must find an internal supervisor before starting the project.

Please pay close attention to any extra notes on requirements (such as being able to drive or to speak particular languages) or the application process. There may be specific limitations on the project availability: if there are then they will be clearly shown further down the page.

Project title
Detecting selection in worldwide and regional Sars-Cov2 genomic data — a tool for surveying the emergence of more transmissible strains and resistance to vaccine and antiviral treatments.
Contact name
Bhavin Khatri
Project based at
Silwood Park (Imperial)
Project description
Tracking new genomic variants of Sars-Cov2 that arise and understanding their evolutionary significance is important in being able to predict the course of this pandemic. Key is identification of sites that are evolving by natural selection vs sites changing by random genetic drift. These sites under selection *could* be increasing transmissibility, or evolving resistance to future vaccine and antiviral treatments. The problem is variants can rise to large frequency even if they are neutral. Using a new method which determines the likelihood that a change in frequency of a polymorphism is due to selection vs random genetic drift, we can detect where significant evolutionary change is happening.

The project will use this method previously used to detect variants under selection in the genomes of HIV and norovirus, to first find sites that show strong signals of selection on transmission in the Sars-Cov2 genome globally, catalogue them and also to explore regional differences. This will include assessing the validity of claim that the D614G mutation in the spike protein of Sars-Cov2 is indeed under positive selection (Korber et al., 2020, Cell 182, 812–827). Sites found to be under selection will be assessed for their biological and epidemiological significance. The genomic data used will be the gisaid repository ( which currently has ≈160,000 sequences Sars-Cov2 sequences. This project will particularly suit anyone with a good quantitative/computational/bioinformatics background, or a willingness to learn for longer duration projects, and an interest in practical applications of statistical inference and multiple hypothesis testing in a genomic context. There is also scope for those with a strong mathematical/theory background for expanding the method to include simple multi-site comparisons.
Additional requirements
Good grasp of basic programming skills and paradigms (if/while statements, for loops). Experience in Matlab or Julia preferred, but not critical. Experience in, or interest in learning, how to handle large genomic datasets.
Date uploaded