Agricultural research data are potentially a valuable source of novel research information, particularly if made Findable, Accessible, Interoperable, and Reusable.
Funding mandates in recent years require that data are accessible to the public at the end of the project, but reusability is often difficult because variables were not clearly defined, units of measurement were not recorded, or information that was obvious to the researcher was not well-documented. Even the most organized and well-documented datasets are difficult to reuse! These legacy datasets are potentially valuable resources that could improve research efficiency if they could be interpreted by automated processes. The Agricultural Research Data Network (ARDN) is providing a means to annotate datasets so that they can be interpreted and combined with other datasets over multiple environments, managements, and genetics, generating new novel research products.
The project is modeled after the data interoperability efforts made by the medical research community in the last few decades. Because data collected in clinical trials use a common vocabulary and similar formats, they can be aggregated across research projects to create entirely new data products which has facilitated advances in medical research and provided additional data for multi-factor analytics.
The goal of ARDN is to create a distributed network for harmonized crop systems research data and to make these data available through the USDA National Agricultural Library’s data portal, Ag Data Commons (ADC). In addition to the core metadata required by ADC, datasets which qualify for ARDN will be annotated with machine readable instructions for converting these datasets to a format developed by the Agricultural Model Intercomparison and Improvement Project (AgMIP). The raw datasets can be left in their original form, but the supplementary annotation allows a subset of the data to be interpreted and reused for modeling, data analytics, and other quantitative analyses. Tools are being developed at UF to allow end users to obtain the data in various end-user formats including crop model-specific formats. UF researchers are also developing tools that make it easy for data providers to annotate their data for inclusion in ARDN.
Researchers, data scientists and programmers at the University of Florida and the National Agricultural Library are developing the tools and protocols that form the basic ARDN infrastructure.
Similar agricultural research data networks, using protocols that are compatible to ARDN, are being developed for the Platform for Big Data in Agriculture and the International Fertilizer Development Center (IFDC) increasing the supply of data from international research that can contribute to ARDN.
A widely adopted ARDN will increase research efficiency by reducing the need for new field experiments while giving credit to researchers who contribute data to the network. Data intensive research will be possible using data mining and AI techniques using a data source with consistent vocabulary and formats and unambiguous meanings. This new source of data will facilitate model improvement including advances in gene-based model development.
Funding sources: This effort was funded by grants from