Malaria is an illness caused by mosquito-borne Plasmodium parasites, the most dangerous of which is P. falciparum, responsible for more than 600,000 childhood deaths each year. Data on the natural diversity of P. falciparum in different geographical regions is crucial for many aspects of malaria biology and disease control. This depends on clinical and epidemiological studies that often require a long-term investment to achieve their scientific objectives.
The CGGH has been supporting MalariaGEN to establish the P. falciparum Community Project whose aim is to enable different research groups to share and release data on parasite polymorphism without compromising their ability to analyse and publish their own findings.
“Essentially, we’re trying to build the value that any individual researcher couldn’t achieve by themselves. Investigators tend to be geographically limited but in order to study genomic diversity, in order to study epidemiology, it’s advantageous to get a global view—by putting things together, we get more than the sum of the parts,” explains Dr Olivo Miotto, CGGH Senior Informatics Fellow, who is based at the Mahidol-Oxford University Tropical Medicine Research Unit (MORU) in Bangkok, Thailand.
Despite clear scientific motivations, the need for a win-win proposition has always been clear to those involved in coordinating the Community Project. “Samples collected in the field are precious, and the data that they generate is really valuable to the researchers who are leading their own scientific study. Asking a community to share their data requires a lot of trust and mutual respect," says Dr Bronwyn MacInnis, Senior Scientific Programme Manager for the Malaria Programme at the Wellcome Trust Sanger Institute and CGGH member.
And, as researchers are still subject to a competitive model of scientific research, the collaborative framework needed to have mechanisms to ensure scientific sustainability. Dominic Kwiatkowski, CGGH Director, says that recognition and respect play a key role. “If we want large-scale projects where researchers are motivated to contribute samples and data that they have worked hard to generate, they need recognition, not least because they need to sustain their research funding.”
The Community Project’s commitment to collaboration is reflected in its model: a diverse community of researchers collect parasite samples in different locations around the world. These samples are sequenced through collaborations with the Wellcome Trust Sanger Institute and MalariaGEN, and the resulting data are returned to the contributing researchers and used by the MalariaGEN P. falciparum Community Project in a number of population-level analyses.
Connecting a research community to the infrastructure to support large-scale genomics
A key strength of the Community Project is that it connects a diverse, global community of researchers to a sophisticated DNA sequencing and analysis pipeline—a massive infrastructure which took considerable effort to develop. “Many of the tools that we need to analyse DNA in parasites are similar to those used in humans but it’s not identical. We often have to go through different lab processes and consider different analytical problems , such as the need for methods of deep sequencing from small clinical samples,” says Kwiatkowski.
Members of the CGGH team at Oxford, Sanger and Bangkok are working together closely on this project. “The large-scale data analysis pipelines developed by Jim Stalker and Magnus Manske at Sanger is vital, as is the population genetic analysis done by Olivo Miotto in Bangkok and the web application development by Paul Vauterin and Ben Jeffrey in Oxford—it’s very much a team effort to solve all of these scientific, technical and practical challenges,” says Kwiatkowski.
When Miotto first joined Mahidol Oxford Research Unit in Bangkok, he had just completed a PhD in Bioinformatics after spending 15 years working in software development. His initial task was to develop a web application for graphical browsing of P. falciparum genome variation data, a project that served as the prototype for the current MalariaGEN P. falciparum web application. By 2010, he'd begun focusing on the population genetic analysis of the deep sequencing data generated by the MalariaGEN P. falciparum Community Project, a role that involves daily interactions between team members in Oxford, Sanger and Bangkok.
Now well-established, this MalariaGEN sequencing and analysis pipeline is pumping genotype data back to partners and feeding to the population-level analyses performed by the Community Project. "If you look at the amount of data coming off the pipeline—even just in terms of terabytes—the growth over the last two years has been massive," says Jim Stalker, Analysis Pipelines Manager for the Sanger Malaria Programme and CGGH member.
Building a collective resource for understanding genetic diversity
While Kwiatkowski acknowledges that his team could have taken a different tack—focused on specific biological questions, aimed for frequent publication—they’ve deliberately chosen to invest in building a research community.
A clear outcome of this investment is a large, global data resource that would not have been possible to generate without the underpinning collaboration. Analysis of the rich data set—including whole genome sequences for thousands of Plasmodium parasites, collected in 20 countries—is yielding deep insights into the parasite genetic diversity.
“Now, we can see how the long-term strategy of building research communities can impact on the science. We’re getting to understand things about the diversity and evolution of the parasites and the mosquitoes that we couldn’t have got at before,” says Kwiatkowski. “The really exciting thing is that genetic diversity is increasingly of interest both to basic biologists and to people who want to control the disease. We’re aligning science with the real world.”