DeepSeek open source DeepEP – library for MoE training and Inference

cyrano@lemmy.dbzer0.com · edit-2 24 hours ago

DeepSeek open source DeepEP – library for MoE training and Inference

cyrano@lemmy.dbzer0.com · 24 hours ago

Deepseek continue its open source releases

DeepEP is a communication library tailored for Mixture-of-Experts (MoE) and expert parallelism (EP). It provides high-throughput and low-latency all-to-all GPU kernels, which are also as known as MoE dispatch and combine. The library also supports low-precision operations, including FP8.

To align with the group-limited gating algorithm proposed in the DeepSeek-V3 paper, DeepEP offers a set of kernels optimized for asymmetric-domain bandwidth forwarding, such as forwarding data from NVLink domain to RDMA domain. These kernels deliver high throughput, making them suitable for both training and inference prefilling tasks. Additionally, they support SM (Streaming Multiprocessors) number control.

For latency-sensitive inference decoding, DeepEP includes a set of low-latency kernels with pure RDMA to minimize delays. The library also introduces a hook-based communication-computation overlapping method that does not occupy any SM resource.

Notice: the implementation in this library may have some slight differences from the DeepSeek-V3 paper.

DeepSeek open source DeepEP – library for MoE training and Inference

DeepSeek open source DeepEP – library for MoE training and Inference

GitHub - deepseek-ai/DeepEP: DeepEP: an efficient expert-parallel communication library