From atomate to atomate2: Navigating complexity and scalability in computational materials research

Jul 03, 2025

Over the past decade, computational materials science has experienced significant advancements driven by developments in computational algorithms and the increased availability of computing resources. Central to this progress is density functional theory (DFT), a computational approach that allows researchers to calculate materials properties from fundamental physical principles. The expanded use of DFT has enabled larger-scale and more precise simulations of materials, facilitating extensive screening and systematic studies of numerous candidates. This evolution has been supported by enhanced computational infrastructures and improved data management practices.

A critical factor in this advancement has been the development of workflow automation frameworks, designed to streamline and manage complex computational tasks associated with DFT simulations. Notably, the first major contribution in this domain was Atomate (Mathew et al., 2017), initially developed to automate routine computational workflows. Its successor, Atomate2 (Ganose et al., 2025), which has just been published, significantly expands upon the original framework's capabilities and flexibility. These platforms have enhanced the accessibility, reproducibility, and scalability of computational approaches within materials science. In this post, we reflect on the historical origins, design philosophies, and key technical advancements characterizing the evolution from atomate to atomate2.

Atomate: laying the groundwork

When atomate first emerged in 2017 (Mathew et al.), its mission was clear: streamline and simplify computational workflows for materials science by automating the most common yet complex tasks associated with DFT calculations. Built upon three core Python libraries—pymatgen, FireWorks, and custodian—atomate aimed to abstract away routine processes, allowing researchers to focus on scientific interpretation rather than workflow mechanics.

Pymatgen served as the foundational tool for managing and manipulating materials structures, generating input files, and parsing simulation outputs. It provided robust, standardized routines for converting structures between various file formats, enabling systematic and consistent materials analyses.

FireWorks acted as the central workflow management system, enabling the construction and execution of workflows through defined tasks known as Fireworks. These Fireworks could be chained together in sophisticated workflows, allowing for sequential and parallel execution of computational steps, and supporting dynamic modification of workflows based on runtime conditions. FireWorks provided extensive job-tracking features, giving users real-time visibility into workflow progress, failures, and performance metrics, which was crucial for managing large computational campaigns.

Custodian addressed error recovery by automatically detecting and correcting common calculation errors, such as convergence failures or simulation crashes, without manual intervention. This significantly improved the reliability and efficiency of computational workflows.

Atomate's modular design focused on creating reusable workflow components. Each computational step, termed a Firework, typically comprised a series of tasks, including:

Writing input files tailored to specific computational methods and settings.
Executing simulation runs, ensuring that calculations were properly submitted and managed within high-performance computing (HPC) environments.
Parsing and extracting relevant outputs such as structural parameters, energies, electronic band structures, and vibrational modes.
Storing results systematically in MongoDB databases, enabling efficient querying and data management.

This modular structure enabled researchers to easily recombine Fireworks into diverse and complex workflows tailored for specific scientific investigations. Atomate supported a wide range of simulation types, from fundamental calculations such as structural relaxations and electronic band-structure analyses to more advanced, specialized simulations like elastic tensor determination, dielectric constant evaluations, phonon spectra, and even simulations of spectroscopic properties (e.g., Raman spectra and X-ray absorption spectra).

Moreover, atomate provided robust provenance tracking through its MongoDB database, capturing comprehensive metadata about calculation settings, input structures, simulation parameters, execution details, and computational environments. This rich provenance information was essential for reproducibility, validation, and comparative analyses, particularly important in high-throughput computational studies and large-scale materials screening initiatives.

Automate dependencies, as shown in Mathew et al., Computational Materials Science 39, 140-152 (2017).

Growing beyond atomate

As atomate was adopted widely—particularly through initiatives such as the Materials Project—it became evident that computational demands and research requirements were rapidly evolving. The broader scientific community increasingly sought to integrate new computational methodologies, address more complex material problems, and improve computational efficiency and accuracy. In particular, the rapid development of machine learning interatomic potentials (MLIPs) introduced significant potential for accelerating simulations, particularly for large systems and long-time scale processes. Additionally, the increasing popularity and capabilities of alternative density functional theory (DFT) codes such as FHI-aims, ABINIT, CP2K, and others beyond VASP presented researchers with expanded choices tailored to specific computational needs and material systems.

These advancements highlighted several key limitations in atomate's initial design:

Code specificity: atomate was originally built primarily around the VASP code, with limited integration of Q-Chem. This specialization constrained its flexibility and restricted researchers who wished to utilize or incorporate alternative DFT software packages with distinct features, computational efficiencies, or methodological strengths tailored to different scientific applications.
Limited flexibility: The original architecture of atomate was not designed to easily accommodate workflows involving multiple computational methods or hybrid simulation approaches. For instance, employing ML-driven initial structure relaxations combined with subsequent high-accuracy DFT refinements required manual intervention and significant customization, reducing the efficiency and practicality of high-throughput computational investigations.
Provenance bottlenecks: With extensive high-throughput calculations generating vast amounts of simulation data, the original atomate framework experienced challenges in efficiently managing, querying, and analyzing this data. The MongoDB-based provenance tracking, initially sufficient for modest-sized datasets, became increasingly complex and performance-limited as simulation campaigns scaled up exponentially, creating significant bottlenecks and hampering efficient data retrieval and analysis.

Collectively, these limitations motivated a comprehensive rethinking and redesign of the original atomate framework, ultimately resulting in the development and release of atomate2, specifically tailored to address these evolving computational demands and emerging research needs.

Atomate2: addressing new challenges

In response to these limitations, atomate2 emerged as a comprehensive rethinking of the original framework, placing modularity, interoperability, and flexibility at its core. Published recently, atomate2 is designed explicitly to handle heterogeneous computational workflows involving diverse electronic-structure codes, including VASP, FHI-aims, ABINIT, CP2K, JDFTx, and Q-Chem, as well as modern MLIPs.

Atomate2’s foundational architecture introduced several key innovations:

Calculator-agnostic approach: A unified application programming interface (API) allows users to switch computational backends seamlessly, enabling hybrid workflows where computationally inexpensive preliminary calculations (e.g., ML-driven relaxations) precede accurate DFT refinements.
Jobflow engine: The transition from FireWorks to the more powerful jobflow library provided a streamlined user experience, enabling dynamic and nested workflows. This facilitated complex tasks such as automatic convergence checks and iterative refinements, significantly reducing the manual effort involved.
Flexible storage options: Enhanced database management, including support for MongoDB, Amazon S3, and Azure Blob storage, facilitated efficient handling of extensive simulation data.
Generalizable workflows: Abstract workflow definitions now allow straightforward adaptation across different computational tools without substantial re-coding.

The breadth of workflows expanded significantly, now covering scenarios from defect calculations and electronic transport to machine-learned molecular dynamics and anharmonic phonon calculations, reflecting atomate2's broader computational capability.

Comparing atomate and atomate2: core improvements

The evolution from atomate to atomate2 highlights significant enhancements in multiple aspects, reflecting broader trends in computational materials science towards greater modularity, flexibility, and usability. Below is a detailed comparison highlighting key improvements:

Table comparing automate and automate2 (own elaboration)

Reflections on the evolution

The transition from atomate to atomate2 reflects broader trends in computational materials science—namely, increasing demands for flexibility, interoperability, and robust data management. This shift highlights the need for computational frameworks to not only manage larger, more complex datasets but also to accommodate emerging computational methods seamlessly. Atomate2 doesn’t merely provide technical improvements; it represents a philosophical shift towards modularity, extensibility, and user empowerment. By adopting a calculator-agnostic approach and enabling dynamic workflows through the jobflow engine, atomate2 significantly lowers barriers to entry, allowing researchers greater freedom to customize and innovate.

Moreover, the atomate2 design philosophy explicitly encourages contributions from the broader scientific community, fostering a sustainable ecosystem where new methods, computational packages, and innovative ideas can be easily integrated. This open and inclusive approach not only promotes collaborative development but also ensures that atomate2 can continually adapt and evolve in response to future scientific and computational advancements. By prioritizing interoperability and modularity, atomate2 positions itself effectively to address current and emerging challenges in computational materials research.

Conclusion and future directions

The journey from atomate to atomate2 underscores an essential narrative in modern computational science: adaptability and openness to change are critical for sustained relevance and continued innovation. As computational power and the scale of materials databases expand dramatically, computational frameworks must evolve rapidly to effectively accommodate and leverage new methods, tools, and computational strategies. The transition to atomate2 highlights the importance of modular design, interoperability, and comprehensive data management in meeting these evolving demands.

Looking ahead, future developments will likely focus on further enhancing dynamic workflow capabilities, allowing workflows to adapt even more intelligently based on computational outcomes and iterative feedback loops. Integrating emerging GPU-native computational methods will be increasingly critical, particularly as graphics processing units (GPUs) become more dominant in scientific computing due to their superior performance for certain numerical computations. Moreover, advancements in machine learning methods and artificial intelligence will likely reshape workflow automation, enabling predictive modeling, active learning, and real-time adaptive workflows.

Another key area of future focus will be refining and optimizing exascale data management strategies, ensuring efficient storage, retrieval, and analysis of the increasingly massive datasets generated by large-scale high-throughput simulations. Improvements in data visualization and interpretation tools will also be essential to help researchers effectively navigate and extract insights from vast and complex datasets.

In essence, atomate2's flexibility, extensive modularity, and forward-looking design provide a robust and adaptable foundation upon which future innovations in computational materials science can be built. By continuing to foster community-driven development and supporting open integration of emerging computational tools and methodologies, atomate2 is well-positioned to shape and support the computational materials science landscape for many years to come.

AI x Science

Discussion about this post