The MalariaGEN P. falciparum Community Project has sequenced thousands of samples from more than 20 countries around the world. To maximise the value of this big data resource, the Community Project needed an accessible tool that could both provide an overview of the global diversity in parasite populations under study and allow researchers to explore this complex data set in more detail.
To meet this need, a plan was formed for a web application with several discreet functions including the ability to browse and query a catalogue of genetic variations and the allele frequencies in different geographical populations.
Informed by previous software development projects, for example ExplorerCat and MapSeq, the plan also included a few ideas about how to better communicate the challenges in generating and interpreting Plasmodium sequence data, which would eventually translate into several tracks on the application’s genome browser.
And, in one very important way the new web application differed from existing tools: it would connect the data to the researchers who generated it.
Putting in place a dedicated development team
Building the development team was the first step and the timing, as it turns out, was fortuitous. “This was going to be a big undertaking. We needed the right composition of people, and we had just hired in what we thought would make a pretty good team,” explains Kwiatkowski.
The development team was led by Dr Paul Vauterin, CGGH Principal Scientific Software Engineer, who had joined the CGGH only a few months prior and had a strong background in commercial software development including experience developing applications used by the US Centres for Disease Control and Prevention. Another key developer on the project, Dr Ben Jeffery, had also worked in a commercial environment and, like Paul, had previously trained as a physicist. The final member of the core team was Rachel Giacomantonio, who had previous experience working in multi-disciplinary teams to develop web-based tools for data-sharing communities.
The first step for Vauterin was to draft a specification for guiding discussion amongst the development team and key members of the MalariaGEN P. falciparum Community Project. The team then worked to define key audiences and develop user journeys—considering how people would use the application.
Designing an intuitive structure for the application
A key early design decision was the shift from a tab-based format, similar to the conventional navigation in a website, to a series of interconnected views—a structure more akin to a mobile application.
The brain child of chief architect, Vauterin, this structural change was an intuitive leap from the original vision but it allowed the team to build a narrative around the key functions for the application: sharing population-level genetic data; helping users understand the challenges involved in sequencing Plasmodium genomes and by extension confidence in the resulting data and SNP discovery process; and, introducing the community of scientists participating in the Community Project.
In the new architecture, each of these functions was embodied by a particular view. Vauterin purposefully built connectivity between the various views to support rich, exploratory pathways through the data.
“From the introduction page, you can grasp the concepts and get the whole thing. The biggest success for the P. falciparum web application is that I think that we managed to find that sweet spot between offering functionality but not offering so much that people can’t grasp the concepts in a very easy way,” says Vauterin.
Avoiding “featuritis,” the unchecked proliferation of new features and functionality, is a tricky but important feat because it decreases the cognitive load for users. The team was helped in this regard by the decision to support tablet devices, like the iPad.
The initial intention was to make the data accessible to busy scientists, who are often on the move, but this decision also helped keep the design lean. New and existing features were evaluated based on how well they worked on tablets—a hurdle Vauterin nicknamed the “iPad test.”
Focusing the application on well-defined needs is also a way of connecting to users. “Users are people, and emotional response is an important factor in an application’s success. If I use a tool and think, ‘Wow, this really solves my problem’ or ‘This just made my life easier,’ then I have a positive emotional response. If I see a lot of functionality but don’t understand how it relates to me and what I’m trying to accomplish, then I don’t have the same reaction,” says Giacomantonio.
In the case of the P. falciparum web application, the level of care invested in the architecture and navigation is indicative of the investment made in preparing the information about the underlying scientific community.
Integrating information about the scientific community
“At a very early stage for the web application, an important criterion was not just having the wonderful technology and data, but also representing the community because that is essential for getting participation,” says Dr Dominic Kwiatkowski, who’s been involved with large multi-centre studies of malaria since the mid-1990s. This experience has taught him that representing the scientific community is a crucial element in ensuring scientific sustainability.
“I’ve realised that in large collaborations you do need to find ways that you can attribute the ways people have contributed. If we really do want large-scale projects where people want to continue to collect new data from the field and contribute samples, they have to feel they get some recognition, for the simple reason they too need to sustain their research funding,” he adds.
While fully committed to this vision, gathering, repurposing and finding appropriate ways to share information about the researchers involved in the Community Project was a key challenge for the development team. “It’s hard by nature because you’re dealing with soft information. We had to materialise the concept of how the project works into something really very concrete,” recalls Vauterin.
As they did continually throughout the project, they referred back to the set of user journeys they’d developed early on, and paired-back to a few key functions for the community information: to communicate the collaborative nature of the project, introduce the partner studies generating the data, and to facilitate direct communication with these groups through a key contact.
By setting a limited scope the team was able to set a clear specification for the community information. “I actually spent quite a lot of time thinking about ways that we might collaborate better and bring all the community information together. It’s a much more massive problem than maybe I realised at the time,” explains Jeffery.
The community information, for example partner study titles and descriptions, was coming from different sources, sometimes internal documents and sometimes public-facing resources. The team enlisted the help of Ian Wright, MalariaGEN’s resident expert in the collaborative software Alfresco, to help build a custom interface to store and serve this data to the web application. “After that was done, I built a space that allowed users to browse that information as part of the web application,” says Jeffery, describing the application’s Partner Studies view.
The next challenge was preparing the information for public consumption. There were varying degrees of detail available for different projects but most had not been signed off for external publication. In a few cases, the information was out of date. Roles within Partner Studies needed to be fairly represented and reflect the Community Project’s data release policy, where a contact person takes responsibility for fielding questions regarding release of genotypes and possible collaboration. And, this information had to fit into the application in a user-friendly way.
Armed with a clear specification for the type and tone for the community information, the team enlisted the help of Dr Bronwyn MacInnis, Senior Scientific Programme Manager with the Malaria Programme at the Wellcome Trust Sanger Institute, to pull together the information and make sure that it was right.
“I’d forgive someone for looking at the Partner Studies view and thinking that this would be a simple thing to build but, in truth, it posed one of the biggest challenges. And, if we had given up, if we had decided it was too hard, then the web application would be a fundamentally different product. It wouldn’t serve the same purpose,” reflects Giacomantonio.
Vauterin likens the application’s purpose to a shop window, giving users a peek into the Community Project itself. “I often compare the application to a poster, where you get a summary of certain important aspects of the Community Project. It sets us apart from how you would look at a piece of software where you have functionality to achieve well-defined goals because basically the goal is to highlight how the community project works,” he adds, “And we’ve done that.”
A little technical wizardry never hurts
While representing the community posed one set of challenges, there were also numerous technical challenges to developing a web application to deliver this type of genetic data, on this scale and in the browser.
“Compared to a desktop application, there are two limitations. One is that desktop applications can have all the data sitting in memory quite easily. They don’t have to get it over the internet. So, one challenge is about developing methods to get a big chunk of memory from the server to the client. The other part was finding ways to draw things fast enough,” says Jeffery.
To solve these technical challenges the team employed a thin client-server architecture and clever tricks to improve the usability. The genotype browser, which is still under development, is a case in point. Jeffery recalls starting the development using vector graphics, Scalable Vector Graphics (SVG), eventually opting to use Canvas. “If you’ve got one pixel per genotype per sample—you’ve got this grid of thousands by thousands and you end of having to draw milllions of genotypes. Even Canvas gets very slow there, so the new genotype browser has a way of pre-drawing an image that’s cached, and stretched and zoomed as needed.”
With millions of pixels on the screen, how does the team stay connected to the data? “That is the remarkable thing: holding the reality in your head that the squares that you’re seeing on the screen relate to someone’s health, at the end of the day,” says Jeffery.
Vauterin’s relationship to the data is also guiding his work, and he is quick to point out that developing the P. falciparum web application has “sharpened his appetite for the richness of the data.”
Adding depth and complexity to the next generation of tools
When Vauterin joined the CGGH, he was new to malaria research but has since built a deep appreciation for the richness of this data—a perspective that’s been helped by sharing office space with Dr Roberto Amato and Jacob Almagra-Garcia, who are both deeply involved with the population-level genetic analyses of the P. falciparum Community Project data.
“The application only touches the surface of what you can do with the data. We’ve been using the word atlas to describe the application and it’s really like that. It doesn’t give the details, rather it tells us some very basic things about the genomic landscape. This is extremely fascinating because one of the challenges in genomics is telling the summary in a way that can be understood,” says Vauterin.
The summary provided by the P. falciparum web application serves a purpose—both scientific and social—but in many ways this application is just the beginning.
“We’ve made something that shows the basics of the basics. The question now is: Can we take this one step further? Can we show some more detailed things like signals of selection and GWAS analysis? How will this work? These are fascinating questions,” says Vauterin, who is already looking ahead.
“If you look at what is going on now, based on the same data, there is a tremendously rich set of things that we can learn from this and I think that we will never actually reach the bottom of this,” he adds.
Creating a generic tool to support the broader research community
Since completing the beta version of the P. falciparum web application, which was released online in the fall of 2013, Vauterin and Jeffery have been hard at work building a powerful generic tool for working with genomic data. The application is called Panoptes, after the mythical Greek giant Argus Panoptes, who had a hundred eyes and was considered an excellent watchman.
The idea is to create a powerful and flexible software package that researchers can use to build custom instances that suit their needs, allowing them to investigate as many aspects of their data as possible.
Panoptes software is already powering several web applications for MalariaGEN, alongside the P. falciparum web application. The intention is to make the software available open source to empower researchers examining a broad remit of genomics challenges.
“Anything’s possible given enough time and coffee,” says Jeffery, as he considers the future with his characteristically unflappable optimism.