PhyML, a software program for tracking coronaviruses
As a scientific discipline, phylogeny is the study of evolutionary relationships between species and, more generally, between living organisms. By defining a methodological framework for carrying out and analyzing these relationships, phylogeny provides a global vision of biodiversity, enabling us to better understand how it varies as a function of environmental change.
PhyML was used to reconstruct the phylogenetic tree of the first genomes of the SARS-CoV-2 virus.
The software was born in the early 2000s, thanks to the thesis work of Stéphane Guindon (supervised by Olivier Gascuel), a research fellow at the Montpellier laboratory of computer science, robotics and microelectronics, attached to the CNRS and the university.
PhyML is a software program that compares DNA sequences to derive these famous phylogenetic trees, and explains the evolution that led to their appearance. When this tool is applied to viruses, it can be used to trace their genealogy and determine, for example, whether the strains circulating in a country are the result of a single transmission or multiple introductions of the virus.
PhyML was not the first software for reconstructing phylogenetic trees, but it was the first to be able to process datasets consisting of several thousand sequences and reconstruct reliable phylogenies.
The analysis of these sequences, thanks to the ingenious algorithms implemented within this software, evaluates the probability of observing sequences in a sample.
“The differences we observe, on portions of the same gene or chromosome, result from the accumulation of DNA mutations over the course of evolution. We then reconstruct the evolutionary tree, or phylogenetic tree, based on the idea that the more similar the sequences, the less ancient their common ancestor,” explains Stéphane Guindon.
Today, PhyML boasts 100,000 lines of code, and a considerable community that feeds it day by day.
What does the future hold for PhyML?
One of the objectives for the coming years is to integrate PhyML into a “dashboard” for monitoring epidemics. The aim is to visualize phylogenetic trees dynamically, to combine them with different geographical information, as well as other available data about an epidemic.