Contributed by Uber, Horovod makes distributed deep learning fast and easy to use
KubeCon + CloudNativeCon North America –The LF Deep Learning Foundation, a community umbrella project of The Linux Foundation that supports and sustains open source innovation in artificial intelligence, machine learning, and deep learning, announces the Horovod project, started by Uber, as its newest project. Horovod, a distributed training framework for TensorFlow, Keras and PyTorch, improves speed, scale and resource allocation in machine learning training activities.
“The LF Deep Learning Foundation is focused on building an ecosystem of AI, deep learning and machine learning projects. Today’s announcement of Uber’s contribution of the Horovod project represents significant progress toward achieving this vision,” said Ibrahim Haddad, Linux Foundation Director of Research. “This project has proven highly effective in training machine learning models quickly and efficiently, and we look forward to working to further grow the Horovod community and encourage adoption of this exciting project.”
Horovod makes it easy to take a single-GPU TensorFlow program and successfully train it on many GPUs faster. Horovod also achieved significantly improved GPU resource usage figures. The project uses advanced algorithms and leverages features of high-performance networks to provide data scientists, researchers and AI developers with tooling to scale their deep learning models with ease and high performance. In benchmarking Horovod against standard distributed TensorFlow, Uber has observed large improvements in its ability to scale, with Horovod coming in roughly twice as fast.
Real-world activities Uber has used Horovod to support include self-driving vehicles, fraud detection, and trip forecasting. It is also being used by Alibaba, Amazon and NVIDIA. Contributors to the project outside Uber include Amazon, IBM, Intel and NVIDIA.
“Uber built Horovod to make deep learning model training faster and more intuitive for AI researchers across industries,” said Alex Sergeev, Horovod Project Lead. “In this spirit, we are honored to contribute Horovod to the deep learning community as the LF Deep Learning Foundation’s newest project. As Horovod continues to mature in its functionalities and applications, this collaboration will enable us to further scale its impact in the open source ecosystem for the advancement of AI.”
Horovod joins existing LF Deep Learning projects: Acumos AI, a platform and open source AI framework; Angel, a high-performance distributed machine learning platform based on Parameter Server; and EDL, an Elastic Deep Learning framework designed to help cloud service providers to build cluster cloud services using deep learning frameworks. Horovod complements these existing projects and future collaboration is anticipated between them.
Horovod Background
Contributed to the LF Deep Learning Foundation by Uber, the project currently has 175 commits from 26 committers, and is licensed under Apache-2.0.
Horovod, which has secured a Linux Foundation Core Infrastructure Initiative Best Practices Badge, is also included in deep learning distributions including AWS Deep Learning AMI, Azure Data Science VM, Databricks Runtime, GCP Deep Learning VM, IBM FfDL, IBM Watson Studio and NVIDIA GPU Cloud. More information on Horovod can be found on the Uber Engineering blog and in this Q&A with Horovod creator, Alex Sergeev.
Following recent news of Uber joining the Linux Foundation as a Gold member, Uber continues to deepen its contributions to open source technology. Another hallmark open source technology from Uber, Jaeger, is a Cloud Native Computing Foundation project.
Organizations and developers interested in contributing projects and learning more about LF Deep Learning Foundation, can go to www.deeplearningfoundation.org.