Working with leading malaria researchers, clinicians and public health practitioners, the WorldWide Antimalarial Resistance Network (WWARN) is collecting, standardising and quality-assuring data on several indicators of malaria drug resistance.
“When we started out, there was a multitude of data already available that could be used to better understand and more accurately track drug resistance in malaria parasites, if only it could be pooled for analysis in a meaningful way,” explains Dr Philippe Guérin, WWARN Executive Director. “Primary data sets were isolated—locked away in labs or clinicians’ files. Even data shared through peer-reviewed publications were often collected using different methods, making it difficult to compare results from similar studies.”
Bringing this data together posed numerous technical and social challenges. With its proven experience building data sharing communities, the MRC Centre for Genomics and Global Health (CGGH) was a natural partner for WWARN, and helped them to specify and develop solutions during its formative stages.
“One element of this work initially was to build a central, online data submission system that people could upload data to—any data, any format—and complete a form to collect study metadata,” recalls Alistair Miles, CGGH Head of Epidemiological Informatics and a key architect of the early data submission system.
To encourage data contributions, the team sought to apply as few restrictions as possible. “The first step was about getting data in the door. To accomplish this we made an early decision to build a system that was agnostic about the data—it took in files, but didn’t look inside them,” explains Miles.
This decision enabled WWARN to meet the broad requirement to accept "any data, any format" but did not yet address the complex process of data curation.
Essentially, curation describes the data management from the moment that data is first submitted to WWARN through its transformation into standardised variables and its presentation on public-facing tools such as WWARN Explorer.
While building tools to perform data curation was outside the scope of work for the initial data submission system—this critical next phase would be completed by WWARN in subsequent years—this early system still needed to support the curation process.
"Implementation is never the challenge; it’s always deciding what to implement"
“What’s the workflow? That was the one thing that really got everyone stuck. It was a real chicken and egg sort of thing. We wanted a system to support the workflow, but we hadn’t yet defined what that was,” Miles recalls. “Ideally, you write the software second and you get the process first.”
However, WWARN were essentially doing both tasks simultaneously. Miles worked closely with WWARN to refine their data curation process, “I started by drafting documents to capture the central process; worked to define user roles—basically who does what, how do people organise themselves. We thought a lot about how to design a process whereby many people can collaborate to effectively manage multiple data sets. Together, we generated a lot of ideas and a set of clear specification documents.”
These documents helped the team to flesh out details, determine what needed to be captured by the system and work through compromises when there were conflicting requirements. The development was iterative and involved months of coordinated meetings, regular discussions, dozens of mock-ups—some on paper, some in pixels—and several functioning prototypes.
Initially, Miles worked with WWARN’s leadership and their Clinical Scientific Group. Eventually, a small cross-cutting team was formed, including scientists, developers, communications representatives and project managers. This team tested out the initial ideas. Miles reflects on the collective push, “We had a series of informal meetings. Then we had a monster session. With paper prototypes plastered to the walls, we stepped through the entire system. We focused on the tough questions and agreed solutions. By the end, we could see what was needed to get the first release out the door.”
Reflecting on the process, Miles adds, “Implementation is never the challenge, it’s always deciding what to implement.” From that point on, the build accelerated. While still iterative, the software development was grounded by a number of crucial decisions.
Reflecting the data-sharing model in the online submission system
While Miles and the team were busy designing the online data submission system, WWARN was developing a simple agreement to govern data-sharing. The resulting document, WWARN’s Terms of Submission, explains how contributed data is secured, stored and used.
Split into five steps, the data submission system is designed to formalise the partnerships necessary for the underlying collaborations—collecting information about data contributors; logging their acceptance of WWARN’s data-sharing agreement; allowing users to control who can access their data. Once this information is collected, data contributors can upload their data to the system for secure storage.
The final step in the online submission process is a straightforward online form, called the Study Site Questionnaire. WWARN’s Scientific Groups—Clinical, Molecular, Pharmacology, In vitro—developed these detailed questionnaires to enable the WWARN and partner research teams to work together to transform submitted data into standardised variables.
WWARN launches their online data submission system
Within a year, WWARN successfully launched their online data submission system through their website. With the release of a robust yet flexible data submission system, WWARN was able to accept data sets online and securely archive them.
This initial release of the data submission system was the first step in a phased approach to software development. Since the first version launched in 2010, WWARN Informatics has developed tools to support automated data curation including standardisation and analysis, and have continued to improve the WWARN Data Repository. This central data repository contains and links the standardised, quality-assured data generated through the curation and pooled analysis processes.
Working closely with more than 200 data contributors, WWARN is able to use this powerful resource to perform pooled analyses that uncover new insights into the emergence and spread of malaria drug resistance, including evaluating the efficacy of antimalarial combination therapies (ACTs), and examining whether known molecular markers can predict clinical outcomes.
The next phase of development on the data submission system is already underway, led by Richard Cooksey in the WWARN Informatics team. “Our experience working on the first release of the data submission system helped us think a great deal about the user experience of the data contributor and curator,” explains Cooksey. “We have learned many lessons from this initial work which are now being applied to allow better integration with other systems and to improve the value of WWARN’s offering to the malaria community.”
Learn more about sharing data with WWARN: www.wwarn.org/partnerships/data
Spin-off data analysis tool: Petl
Through this process, Miles came to an interesting conclusion: “You’ll never solve a complex data standardisation problem with a single piece of software.”
He likens the process of standardising datasets to creating a unique piece of art, “Someone needs to be there with a hammer and chisel, chiselling the data into the right shape. Every data set will be different and we need a brain to think out how to do the transformation.”
Miles saw a niche for a tool that helped strike the balance between an intuitive ‘point and click’ data curation tool and the bespoke programming required to transform very complex datasets.
He started with a question: “What if there was a tool that was one step up from the programming language? Not a full language, but a library of routines or functions that was geared towards specific transformations.”
So he went away and wrote something that fits that bill: a library of easy-to-use data transformation routines. Miles explains, “It’s a set of atomic operations written in a way that becomes almost human readable. Rather than having to code each transformation, it’s just a set of statements.”
“I use it every day; dealing with heterogeneous data is such a ubiquitous problem,” Miles says. He’s not the only one grappling with this problem. Miles has discovered that people working in a variety of fields—from science to finance to government—are using Petl too. “It’s out there open source. I get a really positive message every couple of weeks from someone that’s found Petl and is using it to solve their data transformation problem.”
For more information about Petl, see: http://petl.readthedocs.org/en/latest/
The WorldWide Antimalarial Resistance Network (WWARN) is a multidisciplinary, scientifically independent network working to guide malaria elimination through high-quality analysis, customised research tools and services, and a global platform for exchanging scientific and public health information.