Work on massive data requires the compilation and curation of datasets as well as algorithmic and analytical tools to make sense of the data. I work on both aspects.
On the dataset compilation side, I am heavily involved with the sPlot project, which has the ambitious aim of building a massive global vegetation plot database with representation from all major regions. The group was founded in 2013 in a meeting at sDiv in Leipzig. We are currently soliciting contributions to the dataset – please contact me if you are interested! Already, the dataset has more than 1 000 000 plots from all continents. We have series of papers planned to submit in 2015.
My algorithmic work involves extensive collaboration with researchers at MADALGO, paricularly Constantinos Tsirogiannis. One set of projects involves ecologically-relevant computations on massive phylogenies. In our first algorithmic paper on the topic, we developed approaches to rapidly calculate MPD, PD, NRI and PDI, four common metrics used to describe phylogenetic community structure. Since then, we have extended to several other diversity measures, including beta diversity measures like PhyloSor and UniFrac. These are implemented in an open software package PhyloMeasures, which includes an R package.
In another project, I worked with the TRY plant trait data to estimate the degree of bias in which species have trait measurements and which do not. As many have speculated, the bias appears to be rather large, so being able to correct it will be crucial for any study where not all species have been measured.