Project proposal details
Please look carefully through the proposal details below. If you are interested in the project then contact the supervisor, explaining why you are interested and any background which makes you a good fit for the project.
If this is an external project, the lead supervisor may have suggested someone at Imperial College or the NHM who could act as an internal supervisor and you should also contact them. If the project is external and no internal has been proposed then you must find an internal supervisor before starting the project.
Please pay close attention to any extra notes on requirements (such as being able to drive or to speak particular languages) or the application process. There may be specific limitations on the project availability: if there are then they will be clearly shown further down the page.
Automated profiling of copy number alterations from shallow whole-genome sequencing data of tumour samples
Project based at
Silwood Park (Imperial)
Copy number alterations (CNA) are genetic variations that cause an abnormal number of copies of genomic regions and that represents one of the most important somatic aberrations in cancers. Ovarian carcinomas exhibit profound genomic instability, with more than 95% of subtypes labelled as C class (“primarily with copy number alterations”) (Ciriello et al., Nat Genet 2013). Only 20% of ovarian cancers are diagnosed early and patients with late diagnosis have a survival rate of ~25%. High grade serous ovarian carcinoma (HGSOC), the most common form, exhibits high levels of tumour heterogeneity and variable clinical outcome. The detection of somatic CNAs from HGSOC genomic data can help develop effective and early molecular classifications of HGSOC subtypes and inform personalized treatment. However, accurate quantification of CNAs breakpoints and allele-specific variation requires expensive deep whole genome sequencing (WGS) of individual tumour samples.
The aim of this project is to develop a new cost-effective protocol for molecular classification and prognosis of ovarian carcinoma from signatures of copy number alterations in tumour genomes.
This aim will be achieved by three specific objectives.
1 Development and implementation of a novel computational method to accurately infer copy number alterations from shallow low-cost DNA sequencing data
2 Application of said method to large-scale existing genomic data sets of tumour ovarian high-grade serous carcinoma
3 Evaluation of results to (i) improve prediction accuracy of patient’s survival and relapse probability and (ii) design cost-effective genomic screening protocol
Our team has developed and implemented computational methods to analyse low-cost short-read shallow WGS data for population genetics (Fumagalli et al., Bioinformatics 2014). This framework integrates the statistical uncertainty of sequencing data into the estimation of metrics of genetic variability (Fumagalli et al., Genetics 2013), therefore, it is widely applied in the study of non-model systems, ancient samples, and aneuploid species with chromosomal aberrations. We propose that this framework is highly attractive to detect CNA in tumour genomes as it maximizes the sample size (and thus any statistical power) while limiting the experimental cost (as it does not require deep sequencing of individual samples) and data storage.
This project will analyse both synthetic and real WGS data. Simulated synthetic data of whole-genome human cancers will be used to benchmark the performance of CNA quantification at different experimental scenarios (e.g. depth of coverage, reads length). To assess the applicability of the proposed methodology on real data, we will analyse shallow WGS data (average depth 0.1X) on 300 tumour samples from 142 patients with recurrent ovarian high-grade serous or grade 3 endometrioid carcinoma, recruited by the BriTROC-1 translational research collaborative consortium (Goranova et al., Br J Cancer 2017). This data set includes pathological data for each patient.
The project will be co-supervised by Dr Nadia Guerra (Life Sciences) and Professor Iain McNeish (Medicine).
good programming skills/attitude (linux/bash and R or python or julia); comfortable with statistics
additional training in bioinformatics will be provided
Selection and eligibility
the student will benefit from some knowledge on cancer biology