ESA title

EVE: making Earth observation knowledge accessible to everyone

Posted in

Despite the wealth of Earth observation and Earth sciences knowledge, much of it is scattered across many different sources and formats and only accessible to experts. EVE (Earth Virtual Expert) is a new intelligent companion for exploring the world of Earth observation and Earth sciences that can explain both beginner and advanced concepts, guide users to trusted sources, summarise scientific documents and deliver insights on trends and tools, acting as a centralised platform for Earth observation and Earth sciences insights.

Anyone who has ever had to compile information for a report knows how difficult it is to include all the pertinent data and cross-check references. This is the reality in several domains – and Earth observation and Earth sciences are no exception.

Earth observation and Earth science research generates a lot of high-value knowledge, but this knowledge is scattered across many different sources and formats. Accessing this information usually requires deep expertise, limiting comprehensive understanding.

All of this creates a significant entry barrier for many potential users, like domain practitioners and decision-makers who need transparent, trusted and scientifically robust information – something that traditional systems struggle to provide.

As environmental decisions and interventions rely more and more on Earth observation, there is the need for systems that not only retrieve information but also interpret and reason across heterogeneous sources. Recent advances have been made in the field of large language models (LLMs), but general-purpose models lack the domain specificity and rigorous evaluation needed for reliable Earth Intelligence applications.

Meet EVE (Earth Virtual Expert), an Earth observation and Earth science-specialised LLM. Funded by ESA Φ-lab and built by Pi School, EVE was developed in partnership with Imperative Space and Mistral AI to close the gap between Earth sciences and decision-making.

EVE-Instruct, the core 24B LLM for Earth Intelligence behind EVE’s chat platform, was built on Mistral’s Small 3.2 model and further optimised for reasoning and question answering. As corpus design and domain-adaptive pre-training are central to the performance of a specialised LLM, the team curated a large-scale Earth observation and Earth sciences corpus by manually selecting 172 sources across 22 trusted publishing institutions that included open-access, private and proprietary collections, the latter as the result of a partnership agreement with Wiley.

Adapting an instruction-tuned LLM to a target domain may come at the expense of the ability of the model to follow instructions, its conversational stability or tool-use behaviour. The team implemented a fine-tuning strategy that interleaves instruction fine-tuning and long-form text, mixing general-domain replay data with synthetic Earth observation and Earth sciences text. As a result, EVE-Instruct is more stable and can follow instructions better than its parent model.  

Due to a lack of standardised benchmarks for dialogue and natural language processing capabilities applied to Earth observation and Earth sciences, the team also curated an evaluation set targeting domain-relevant tasks like multiple choice question-answering (MCQA), hallucination detection and open-ended question-answering (QA), in what constitutes the first systematic benchmarks within Earth observation and Earth sciences for language modelling.

EVE-Instruct was evaluated using these benchmarks and general-domain benchmarks to access domain gains and preservation of its general capabilities. It was compared against the parent model and three additional LLMs of comparable scale: Mistral Small 3.2, Gemma3, Qwen3 and Llama4 Scout, respectively.

EVE-Instruct achieved the highest performance across MCQA benchmarks, indicating effective incorporation of Earth observation and Earth science knowledge during the fine-tuning step. It also leads competing models on open-ended QA without context under both the ‘LLM-a-as-judge’ and ‘Win Rate’ evaluations.

To address the issue of factual hallucinations and to extend EVE’s knowledge beyond the training data, the team developed a Retrieval-Augmented Generation pipeline that grounds EVE’s answers in relevant documents from team-curated Earth observation and Earth sciences knowledge bases.

For hallucination detection, EVE-Instruct goes through a first fact-checking stage after a query, in which it acts as an evaluator, producing a binary hallucination label and a justification for the label. If any hallucination occurred, the query is reformulated using the justification to address the identified issues. With newly retrieved information, the model generates a revised, more grounded response.

EVE-Instruct has the ability to critique the original answer using both prior and newly retrieved evidence and then produce a revised answer. In the end, the model ranks the original and revised outputs, selecting the most evidence-supported, reliable response.

Beyond offline evaluation, a six-month pilot stage for EVE’s chat platform was carried out starting in September 2025 with the help of 350 users, through a graphic user interface and an application programming interface. Interested parties can read more about the development of EVE in this technical paper and in this one-pager.

The models, code, curated corpus, benchmarks and a subset of the synthetically-generated fine-tuning dataset used to create EVE-Instruct are now available on EVE-ESA’s Hugging Face and GitHub.

By using EVE’s chat platform, anyone – despite scientific background and level of expertise – can explore and ask Earth observation and Earth science-related questions using natural language. This platform will be able to explain both beginner and advanced concepts, guide users to trusted sources, summarise scientific documents and deliver insights on trends and tools.

An operational version of platform is undergoing its final stages of development and will be available soon. Registrations for its public launch are now open here.

For now, EVE is text-only and does not reason directly over Earth observation and Earth sciences imagery or structured geospatial data. However, the team aims to expand it into a multimodal, agentic platform capable of reasoning over imagery and geospatial data, supporting multi-step scientific workflows for large-scale Earth observation and Earth sciences analyses and data-driven inference.

To facilitate this transition, the next steps have already been prepared: EVE natively operates using the standard Model Context Protocol (MCP), enabling seamless connectivity with a wide range of external geospatial tools, services, and processing backends. This design choice ensures that multimodal and agentic capabilities can be integrated incrementally, allowing EVE to orchestrate and interact with geospatial computation resources as they are plugged into the MCP ecosystem.

Discover more about EVE and register for EVE’s public opening on the EVE website.

To know more: ESA Φ-lab, Pi School, Imperative Space, Mistral AI, Wiley

Photo courtesy of Unsplash/Vimal S.

Latest news

Subscribe to our newsletter

Share