Computational science, machine learning, and data analysis constantly advance and accelerate scientific discovery. Scientific simulations, models, and analyses for such advances usually depend on parameters whose exact values require tuning. A key challenge in such tuning is understanding how the outcome of a model changes with varying parameters and the sensitivity of each parameter. This research aims to establish a novel paradigm to help visualize and understand the outcomes of parameter changes by multidimensional parameter-space feature tracking. A generalization of traditional spacetime feature tracking, parameter-space feature tracking associates essential features such as extrema, vortex core lines, and boundary surfaces across multiple data instances induced by different input parameters.
Specifically, we will research the tracking, analysis, and visualization of features across multidimensional parameter spaces. First, we will research mesh- and density-based methods for feature tracking in continuous and discrete parameter spaces. Second, we will research analysis algorithms to derive insights directly from multidimensional features. Third, we will investigate visualization methods to help enable parameter space exploration and understand multidimensional features. The proposed methodology will be applied to help understand ensemble simulations and machine learning models in Earth systems, fusion energy, and X-ray tomography applications.
Data reduction is necessary for many scientific domains because large-scale numerical simulation codes and instruments produce massive datasets, often at high rates, making it challenging to move and store the data they produce. Lossy compression for scientific data is important in the range of reduction techniques because it complements sampling, filtering, and dimensionality reduction by preserving all data points and only leveraging redundancies and tuning accuracy to reduce dataset sizes. However, current techniques for designing lossy compression methods suffer three critical limitations: lack of theoretical support, lack of direct error controls for spatial quantities of interest (QoIs), and lack of support for structured and unstructured grids used by some of these instruments and codes. For example, currently, lossy compressors for scientific data are designed without the notion of optimality, resulting in incremental progress. Like other research domains relying on theoretical guides for algorithm design a rigorous theoretical framework is needed to guide lossy compressor design.
To address the three limitations, we propose developing ZF, a novel framework for reasoning about, designing, and building lossy compressors approaching practical lossy compressibility limits. We will develop ZF from three connected research thrusts: 1) innovative techniques to compute practical compressibility limits for any scientific dataset given quantity of interest (QoI) preservation constraints, 2) physics-based QoI preservation techniques for structured and unstructured grids and their integration into compressibility limit formulations, and 3) efficient approximation methods to approach practical compressibility limits. We will use ZF to design and implement novel lossy compressors for three DOE applications (ARM observatory, turbulent flows in complex geometry, and high-energy coherent diffraction imaging), representing three domains (climate, combustion, and light sources), and three scientific modi operandi (observation, simulation, and experiment).
The ZF project will produce new knowledge and understanding, QoI preservation techniques, decorrelation methods, and high-performance compressor implementations generalizable to a broad range of mission-critical DOE applications where compression is critically important. Thrust 1 is the first attempt to adapt fundamental rate-distortion theory to scientific data. The resulting formulations will inform the effectiveness of existing lossy compressors and provide the community with a solid guide to designing new compression schemes for scientific data. Thrust 2 addresses the unexplored preservation of QoIs with three novel techniques depending on tractability: symbolic derivation of QoI preservation, co-design of QoI preservation and analysis, and iterative and selective QoI preservation. Thrust 3 develops innovative, effective, and fast compression algorithms based on multivariate prediction based on approximated co-variance matrix, randomized low-rank approximations, and graph neural networks.
Data is the fourth pillar of the science methodology. However, rapidly expanding volumes and velocities of scientific data generated by simulation and instrument facilities present serious storage capacity, storage and network bandwidth, and data analysis challenges for many sciences. These challenges ultimately limit research discovery which would promote prosperity and welfare. Many research groups are exploring the use of data reduction techniques to address these challenges because lossy compression for scientific data offers a reliable, high-speed, and high-fidelity solution. However, existing generic lossy compressors often do not correspond to user-specific applications, use cases, and requirements in terms of reduction, speed, and information preservation. Hence, many potential users of lossy compressors for scientific data develop their own specialized lossy compression software, an effort that requires tremendous collaboration between compressor experts and domain scientists, demands extensive coding to optimize performance on multiple platforms, and often leads to redundant research and development efforts. This project aims to create a framework, called FZ, that revolutionizes the development of specialized lossy compressors by providing a comprehensive ecosystem to enable scientific users to intuitively research, compose, implement, and test specialized lossy compressors from a library of pre-developed, high-performance data reduction modules optimized for heterogeneous platforms. This project also contributes to the education and training of undergraduate and graduate students by enhancing the quality of computing-related curricula in scientific data management, compression, and visualization and through outreach activities at four universities.
This project builds FZ, an intuitive cyberinfrastructure for the composition of specialized lossy compressors, by adapting, combining, and extending multiple existing capabilities from the SZ lossy compressor, the LibPressio unifying compression interface, the OptZConfig optimizer of compressor configurations, the Z-checker and QCAT compression quality analysis tools, and the Paraview and VTK visualization tools. The project has three thrusts: (1) It develops programming interfaces and a compressor generator to create new compressors from high-level languages such as Python and optimize their execution. (2) It refactors the SZ lossy compressors infrastructure to enable fine-grained composability of a large diversity of data transformation modules and integrate non-uniform compression capabilities, new preprocessing, decorrelation, approximation, and entropy coding data transformation modules to produce specialized lossy compressors. (3) It provides interactive visualization, quality assessment, and graphical user interface (GUI) tools that adapt and extend existing capabilities to automatically search optimized lossy compression module compositions and to identify relevant compression ratio, speed, and quality trade-offs for their use cases.
Today's large-scale simulations are producing vast amounts of data that are revolutionizing scientific thinking and practices. For instance, a fusion simulation can produce 200 petabytes of data in a single run, while a climate simulation can generate 260 terabytes of data every 16 seconds with a 1 square kilometer resolution. As the disparity between data generation rates and available I/O bandwidths continues to grow, data storage and movement are becoming significant bottlenecks for extreme-scale scientific simulations in terms of in situ and post hoc analysis and visualization. The disparity necessitates data compression, which compresses large-scale simulations data in situ, and decompresses data in situ and/or post hoc for analysis and exploration. On the other hand, a critical step in extracting insight from large-scale simulations involves the definition, extraction, and evaluation of features of interest. Topological data analysis has provided powerful tools to capture features from scientific data in turbulent combustion, astronomy, climate science, computational physics and chemistry, and ecology. While lossy compression is leveraged to address the big data challenges, most existing lossy compressors are agnostic of and thus fail to preserve topological features that are essential to scientific discoveries. This project aims to research and develop advanced lossy compression techniques and software that preserve topological features in data for in situ and post hoc analysis and visualization at extreme scales. The success of this project will promote scientific research on driving applications in cosmology, climate, and fusion by enabling efficient and effective compression for scientific data, and the impact scales to other science and engineering disciplines. Furthermore, the research products of this project will be integrated into visualization and parallel processing curricula, disseminated via research and training workshops, and used to attract underrepresented students for broadening participation in computing.
This project tackles the data compression, analysis, and visualization needs in extreme-scale scientific simulations by developing a suite of topology-aware data compression algorithms for scalar field and vector field data. Such algorithms effectively reduce the size of data while preserving critical features defined by topological notions. This project will define and enforce topology-aware constraints over advanced lossy compression algorithms. Such capabilities have not been studied systematically within today’s data compression paradigm. This project will impact specific fields, including computational science, data analysis, data compression, and visualization, and the broader scientific community. The research products of this project will be delivered as publicly available software to significantly advance the research cyberinfrastructure for current and upcoming exascale systems. This project will foster novel discoveries in multiple scientific disciplines beyond cosmology, climate, and fusion by enabling efficient and effective compression on a wide range of platforms.
Ensemble simulations are needed in DOE science applications to understand model sensitivities and to quantify model uncertainties. Ensembles are a sparse sample of possible outcomes of simulation parameters, and the high volumes, velocities, and varieties of ensemble simulation outputs are a grand challenge for the comprehensive understanding of simulation parameter spaces. Driven by three DOE applications, namely Earth systems, energy systems, and cosmology, our research will consist of three tightly integrated pillars to increase understanding and utility of ensemble models, namely: visualization surrogates, variation models, and actionable visualization outlined below.
Visualization surrogates will focus on creating AI models to relate the input parameter space to generated visualizations in image space, thus providing a comprehensive set of desirable future states. Such models will mitigate the high cost of executing full simulations for the visual analytics of parameter spaces. Specifically, we will research both image- and data-based surrogate models; the former directly predicts 2D visualization images, and the latter synthesize 3D grid/particle data for scenarios that require 3D interactions and feature extraction. The outcome of the models will allow interactive exploration and enable efficient sampling of the parameter space for deriving statistics and supporting decision-making.
Variation models will derive principal characteristics of the simulation parameter spaces. The surrogate models will allow efficient sampling of the parameter space for characterizing and ranking the impact of individual parameters for the user-specified objectives. Methodologies developed in herein will suggest high-sensitive parameter(s), the direction(s) for parameter-tuning, and the confidence of the action based on existing samples.
Actionable visualization will leverage surrogate and variation models to establish an actionable paradigm for human-in-the-loop decision-making for optimizing input parameters. Specifically, it will bridge parameter space and decision space through the visualizations and features derived using visualization surrogates and variation modeling by translating principal characteristics into interactive visual metaphors. Rather than computing expensive inverse models from decision space back to parameter space, we will store the mapping between parameter space and decision space and use human-guided interactive visual exploration to enable understandable and actionable decision making.
The QUAntum chromodynamics Nuclear TOMography (QuantOm) Collaboration convenes domain scientists, applied mathematicians, and computational scientists to address the challenge of 3D imaging of quarks and gluons in nucleons and nuclei.
In 2022 the DOE's Scientific Discovery through Advanced Computing (SciDAC) program, a partnership between DOE’s Advanced Scientific Computing Research (ASCR) and Biological and Environmental Research (BER) Offices, launched the Improving Projections of AMOC and Collapse Through Advanced Simulations (ImPACTS) project. The ImPACTS project has two main objectives: 1) increase our physical understanding of AMOC and how it is represented in Earth System Models (ESMs), 2) develop advances in analyses, workflows, and eddy-resolving ESM initialization and efficiency to enable long-term simulations of AMOC and its stability. In the first objective, we will bring together analyses from the ESM and applied math communities to accelerate our analysis capability and transform our understanding of AMOC. While ML/AI analyses for ESM simulations have expanded greatly, none of these analyses have proven transformational. By leveraging the state of the science in AI from the RAPIDS2 institute, we will push the boundaries of AI for ESM analysis. In the second objective, we will utilize recent successes in AI to generate physics-constrained initial conditions at eddy-resolving resolution, allowing us to dramatically reduce the extreme times to achieve ocean equilibration. Improvements in MPAS-ocean model throughput will increase E3SM exascale readiness and has the potential to more than double the performance of the model. Taken together, these improvements will allow us to simulate hundreds of years at eddy-resolving resolution. These two primary objectives will proceed in parallel through most of the proposal but will later combine to deeply probe AMOC strength and its stability across model resolutions, which will be informed by a novel simulation campaign.
As scientists anticipate the benefits of exascale computing, the lack of novel solutions to process data at scale and calibrate the simulation parameters has become a significant roadblock to further accelerating scientific discovery. The goal of this project is to develop a new end-to-end data analysis and feature extraction workflow based on deep neural networks to help computational scientists address three major challenges: (1) identify important simulation parameters and generate the essential data for analysis, (2) transform the simulation data to compact feature representations to convey the most insight, and (3) design scalable visualization algorithms coupled with large-scale simulations to glean insight into their scientific problems. Working with domain scientists in jet engine design, climate models, cardio/cerebrovascular flow, superconductivity, and fusion energy, the team will demonstrate how deep learning techniques can help extract features from vast amounts of simulation data and navigate in the huge simulation parameter space. Through summer internships and project collaborations, this research will create opportunities for graduate and undergraduate students, including students from underrepresented groups, to participate in key research initiatives with leading scientists. Through the planned annual summer school on "Deep Learning for Visualization," the research results will enable visualization researchers and a broader community to incorporate the principles and practice of deep learning techniques developed.
RAPIDS is a SciDAC computer science institute whose objective is to assist application teams in overcoming computer science and data challenges in the use of DOE supercomputing resources. We address computer science and data technical challenges for science teams, work directly with scientists and DOE facilities to adopt and support our technologies, and coordinate with other SciDAC Institutes and DOE computer science and applied mathematics activities to maximize the benefits to science.
The growing disparity between the size of simulation output and I/O rates makes it imperative for applications to be able to subset, compress, extract features, and analyze results at runtime. Hence, exascale systems will be increasingly used for computations that involve more than a single simulation, such as data assimilation, calibration, and uncertainty quantification. CODAR is a co-design center focused on online data analysis and reduction at the exascale that includes approximation, reduction, assimilation, calibration, data mining, and statistical analysis.
FTK is a library that provides building blocks for feature tracking algorithms in scientific datasets.
In the Post-Moore era, as gap increases between the data production rates and available I/O bandwidth, online feature extraction makes it possible to derive insights during scientific simulations and thus reduce data to be stored. The purpose of this study is to explore and develop feature extraction and tracking algorithms in the context of neuromorphic architectures, which are biologically inspired and superior to von Neumann architecture in energy efficiency, scalability, and fault tolerance.
Ensembles are collections of data produced by simulations or experiments conducted with different initial conditions, parameterizations, or phenomenological models. The goal of this project is to develop visual analytic techniques for large scale scientific ensemble data sets. We are tackling the problem of large scale ensemble data analysis and visualization from four unique perspectives: (1) exploration of local uncertainty with distributions, (2) exploration and tracking of ensemble features, (3) exploration of multivariate ensemble parameters, and (4) automation of in situ ensemble analytics.
In this project, we develop a distribution-based data analysis and visualization framework for optimizing in situ processing and analysis of extreme-scale scientific data. Our framework will consist of the following three components: (1) computation, representation, and indexing of distributions; (2) data summarization, reduction, and triage; and (3) distribution-based visual analytics.
Global climate models require significant mathematical, algorithmic, and software advances to enable efficient use of next-generation computer platforms. This project investigates an integrated approach to develop new coupler services and to develop and execute in situ data analytics in the coupled framework.
In-depth characterization and analysis of the errors, failures and faults for large-scale supercomputing environment.