What’s New: At SC23, Intel showcased AI-accelerated high performance computing (HPC) with leadership performance for HPC and AI workloads across Intel® Data Center GPU Max Series, Intel® Gaudi®2 AI accelerators and Intel® Xeon® processors. In partnership with Argonne National Laboratory, Intel shared progress on the Aurora generative AI (genAI) project, including an update on the 1 trillion parameter GPT-3 LLM on the Aurora supercomputer that is made possible by the unique architecture of the Max Series GPU and the system capabilities of the Aurora supercomputer. Intel and Argonne demonstrated the acceleration of science with applications from the Aurora Early Science Program (ESP) and the Exascale Computing Project. The company also showed the path to Intel® Gaudi®3 AI accelerators and Falcon Shores.
“Intel has always been committed to delivering innovative technology solutions to meet the needs of the HPC and AI community. The great performance of our Xeon CPUs along with our Max GPUs and CPUs help propel research and science. That coupled with our Gaudi accelerators demonstrate our full breadth of technology to provide our customers with compelling choices to suit their diverse workloads.
Why It Matters: Generative AI for science along with the latest performance and benchmark results underscore Intel’s ability to deliver tailored solutions to meet the specific needs of HPC and AI customers. Intel’s software-defined approach with oneAPI and HPC and AI-enhanced toolkits, help developers seamlessly port their code across architectural frameworks to accelerate scientific research. Additionally, Max Series GPUs and CPUs will be deployed in multiple supercomputers that are coming online.
About Generative AI for Science: Argonne National Laboratory shared progress on its genAI for science initiatives with the Aurora supercomputer. The Aurora genAI project is a collaboration with Argonne, Intel and partners to create state-of-the-art foundational AI models for science. The models will be trained on scientific texts, code and science datasets at scales of more than 1 trillion parameters from diverse scientific domains. Using the foundational technologies of Megatron with DeepSpeed, the genAI project will service multiple scientific disciplines, including biology, cancer research, climate science, cosmology and materials science.
The distinctive Intel Max Series GPU architecture and the Aurora supercomputer system capabilities can efficiently handle 1 trillion-parameter models with just 64 nodes, far fewer than would be typically required. Argonne National Laboratory ran four instances on 256 nodes, demonstrating the ability to run multiple instances in parallel on Aurora, paving the path to scale the training of trillions of parameter models more quickly with trillions of tokens on more than 10,000 nodes.
About Intel and Argonne National Laboratory: Intel and Argonne National Laboratory demonstrated the acceleration of science at scale enabled by the system capabilities and software stack on Aurora.1 Workload examples include:
- Brain connectome reconstruction is enabled at scale with Connectomics ML, showing competitive inference throughput on more than 500 Aurora nodes.
- General Atomic and Molecular Electronic Structure System (GAMESS) showed over 2x competitive performance with Intel Max GPU compared to the Nvidia A100. This enables the modeling of complicated chemical processes in drug and catalyst design to unlock the secrets of molecular science with the Aurora supercomputer.
- Hardware/Hybrid Accelerated Cosmology Code (HACC) has demonstrated runs on more than 1,500 Aurora nodes, enabling the visualization and understanding of the physics and evolution of the universe.
- The drug-screening AI inference application, part of the Aurora Drug Discovery early science project (ESP), enables efficient screening of vast chemical datasets by enabling the screening of more than 20 billion of the most synthesized compounds on just 256 nodes.
Intel also showed new HPC and AI performance, as well as software optimizations across hardware and applications:
- Intel and Dell published results for STAC-A2, an independent benchmark suite based on real-world market risk analysis workloads, showing great performance for the financial industry. Compared to eight Nvidia H100 PCIe GPUs, four Intel® Data Center GPU Max 1550s had 26% higher warm Greeks 10-100k-1260 performance and 4.3x higher space efficiency.
- The Intel® Data Center GPU Max Series 1550 outperforms Nvidia H100 PCIe card by an average of 36% (1.36x) on diverse HPC workloads.
- Intel Data Center GPU Max Series delivers improved support for AI models, including multiple large language models (LLMs) such as GPT-J and LLAMA2.
- Intel® Xeon® CPU Max Series, the only x86 processor with high bandwidth memory (HBM), delivered an average 19% more performance compared to the AMD Epyc Genoa processor.
- Last week, MLCommons2 published results of the industry standard MLPerf training v3.1 benchmark for training AI models. Intel Gaudi2 demonstrated a significant 2x performance leap with the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark.
- Intel will usher in Intel Gaudi3 AI accelerators in 2024. The Gaudi3 AI accelerator will be based on the same high-performance architecture as Gaudi2 and is expected to deliver 4x the compute (BF16), double the networking bandwidth for greater scale-out performance, and 1.5x the on-board HBM memory to readily handle the growing demand for high-performance, high-efficiency compute of LLMs without performance degradation.
- 5th Gen Intel® Xeon® processors will deliver up to 1.4x higher performance gen-over-gen on HPC applications as demonstrated by LAMMPS-Copper.
- Granite Rapids, a future Intel Xeon processor, will deliver increased core count and built-in acceleration with Intel® Advanced Matrix Extensions and support for multiplexer combined ranks (MCR) DIMMs. Granite Rapids will have 2.9x better DeepMD+LAMMPS AI inference. MCR achieves speeds of 8,800 megatransfers per second based on DDR5 and greater than 1.5 terabytes per second of memory bandwidth capability in a two-socket system, which is critical for feeding the fast-growing core counts of modern CPUs and enabling efficiency and flexibility.
About New Progress on oneAPI: Intel announced features for its 2024 software development tools that advance open software development powered by oneAPI multiarchitecture programming. New tools help developers extend new AI and HPC capabilities on Intel CPUs and GPUs with broader coverage, including faster performance and deployments using standard Python for numeric workloads, and compiler enhancements delivering a near-complete SYCL 2020 implementation to improve productivity and code offload.
Additionally, Texas Advanced Computing Center (TACC) announced its oneAPI Center of Excellence will focus on projects that develop and optimize seismic imaging benchmark codes. Intel fosters an environment where software and hardware innovation and research advance the industry, with 32 oneAPI Centers of Excellence worldwide.
What’s Next: Intel emphasized its commitment to AI and HPC and highlighted market momentum. New supercomputer deployments with Intel Max Series GPU and CPU technologies include systems like Aurora, Dawn Phase 1, SuperMUC-NG Phase 2, Clementina XX1 and more. New systems featuring Intel Gaudi2 accelerators include a large AI supercomputer with Stability AI as the anchor customer.
This momentum will be foundational for Falcon Shores, Intel’s next-generation GPU for AI and HPC. Falcon Shores will leverage the Intel Gaudi and Intel Xe intellectual property (IP) with a single GPU programming interface built on oneAPI. Applications built on Intel Gaudi AI accelerators, as well as Intel Max Series GPUs today will be able to migrate with ease to Falcon Shores in the future.