支持扫描代码文件中可使用鲲鹏加速库优化后的函数或汇编指令,生成可视化报告。
编码时能够自动匹配鲲鹏加速库函数字典,智能提示、高亮、联想字典中可以替换的库和函数。
提供场景化的鲲鹏应用工程模板和SDK,快速辅助工程环境构建、配置检查、依赖下载、构建文件生成等。
基于鲲鹏加速策略,对客户场景通用的软件库进行深度加速,优化后的库接口保持不变。目标库主要涵盖系统库、压缩、加解密、媒体、数学库等。
1.系统库(Glibc-patch、HyperScan、AVX2Neon)
类别 | 说明 |
Glibc-patch | 对内存、字符串、锁等接口基于华为鲲鹏920处理器微架构特点进行了加速优化 |
HyperScan | 基于鲲鹏微架构优势,使用鲲鹏指令加速正则表达式的编译、扫描性能 |
AVX2Neon | AVX2Neon是一款接口集合库。使用鲲鹏加速指令对传统平台定义的gcc Intrinsic内建函数进行适配,以支持使用了传统平台Intrinsic接口的应用能平滑迁移到鲲鹏平台 |
2.压缩(Gzip、ZSTD、Snappy、KAEzip)
类别 | 说明 |
Gzip | 基于Gzip-1.10,通过数据预取、循环展开、CRC指令替换等方法,来提升其在鲲鹏计算平台上的压缩和解压缩速率,尤其对文本类型文件的压缩及解压具有更明显的性能优势 |
ZSTD | 基于zstd-1.4.4,通过使用NEON指令、内联汇编、代码结构调整、内存预取、指令流水线排布优化等方法,实现ZSTD在鲲鹏计算平台上压缩和解压性能的提升 |
Snappy | 基于Snappy-1.1.7,使用内联汇编、宽位指令、优化CPU流水线、内存预取等方法,实现Snappy在鲲鹏计算平台上的压缩和解压速率提升 |
KAEzip | KAEzip是鲲鹏加速引擎的压缩模块,使用鲲鹏硬加速模块实现deflate算法,结合无损用户态驱动框架,提供高性能Gzip/zlib格式压缩接口 |
3.加解密(KAECrypto)
类别 | 说明 |
KAECrypto | 使用鲲鹏硬加速模块实现RSA/SM3/SM4/DH/MD5/AES算法,结合无损用户态驱动框架,提供高性能对称加解密、非对称加解密算法能力,兼容openssl1.1.1a及其之后版本,支持同步&异步机制 |
4.媒体(HMPP、X265、X264)
类别 | 说明 |
HMPP | 鲲鹏超媒体性能库HMPP(Hyper Media Performance Primitives)包括向量缓冲区的分配与释放、向量初始化、向量数学运算与统计学运算、向量采样与向量变换、滤波函数、变换函数(快速傅里叶变换),支持IEEE 754浮点数运算标准,支持鲲鹏平台下使用 |
X265 | 针对FFmpeg视频转码场景,对X265的转码底层算子使用鲲鹏向量指令进行加速优化,提高整体性能。补丁已经回馈X265官网社区,已在X265 3.4版本正式发布 |
X264 | X264是采用GPL授权的视频编码免费软件,主要功能实现H.264/MPEG-4 AVC的视频编码 |
5.数学库(KML_FFT、KML_BLAS、KML_SPBLAS、KML_MATH、KML_VML、KML_LAPACK、KML_SVML、KML_SOLVER)
类别 | 说明 |
KML_FFT | KML_FFT基于鲲鹏架构,通过向量化、算法改进,对快速离散傅里叶变换进行了深度优化,使得快速傅里叶变换接口函数的性能有大幅度提升 |
KML_BLAS | KML_BLAS基于鲲鹏架构,通过向量化、数据预取、编译优化、数据重排等手段,对BLAS的计算效率进行了深度挖掘,使得BLAS接口函数的性能逼近理论峰值 |
KML_SPBLAS | KML_SPBLAS基于鲲鹏架构,充分利用鲲鹏的指令集和架构特点,开发了高性能稀疏矩阵运算库,提升HPC和大数据解决方案业务性能 |
KML_MATH | KML_MATH通过周期函数规约、算法改进等手段,提供了基于鲲鹏处理器性能提升较大的函数实现 |
KML_VML | KML_VML通过NEON指令优化、内联汇编等方法,对输入数据进行向量化处理,充分利用了鲲鹏架构下的寄存器特点,实现了在鲲鹏处理器上的性能提升 |
KML_LAPACK | KML_LAPACK通过分块、求解算法组合、多线程、BLAS接口优化等手段,基于鲲鹏架构对LAPACK的计算效率进行了优化,实现了在鲲鹏处理器上的性能提升 |
KML_SVML | KML_SVML通过Neon指令优化、内联汇编等方法,对输入向量进行批量处理,充分利用了鲲鹏架构下的寄存器特点,实现了在鲲鹏服务器上的性能提升 |
KML_SOLVER | KML_ SOLVER是稀疏迭代求解库(Iterative Sparse Solvers),包含预条件共轭梯度法(PCG)和广义共轭残差法(GCR)。当前KML_SOLVER为单节点多线程版本 |
6.存储(Smart Prefetch、SPDK、ISA-L)
类别 | 说明 |
Smart Prefetch | Smart Prefetch(智能预取),创新性地采用高速缓存盘配合高效的预取算法,提升系统存储IO性能,进而提升上述解决方案中存储IO密集型场景的整体性能。 |
SPDK | SPDK全称Storage Performance Development Kit(高性能存储开发包),SPDK的目标是通过使用网络技术、处理技术和存储技术来提升效率和性能。通过运行为硬件设计的软件,SPDK已经证明很容易达到每秒钟数百万次IO读取,通过使用许多处理器核心和许多NVMe驱动去存储,而不需要额外卸载硬件。 |
ISA-L | ISA-L全称Intelligent Storage Acceleration Library,是提供RAID、纠删码、循环冗余检查、密码散列和压缩的高度优化的函数。 |
7.网络(XPF、DPDK)
类别 | 说明 |
XPF | XPF(Extensible Packet Framework)加速库是鲲鹏自研加速库,XPF自研功能模块,在OVS(Open vSwitch)软件内部实现了一个智能卸载引擎模块 |
DPDK | DPDK全称Data Plane Development Kit,为用户空间高效的数据包处理提供数据平面开发工具集,包括库函数和驱动。 |
8.HPC
类别 | 说明 |
HMPI | HMPI(Hyper MPI)是整个高性能计算解决方案的关键组件,它实现了并行计算的网络通讯功能,可以用来支持制造、气象、超算中心等应用场景,同时该通信库也可扩展应用于AI、大数据等通用领域 |
Scans the code files for functions and assembly instructions that can be replaced with those in the Kunpeng libraries, and generates visualized reports.
Automatically queries the function dictionary provided by the Kunpeng Library to intelligently prompt, highlight, and associate those replacement functions and libraries to facilitate coding.
Provides scenario-specific Kunpeng application project templates and SDKs to help quickly build project environments, check configurations, download dependencies, and generate build files.
Performs in-depth acceleration based on the Kunpeng acceleration policy for common software libraries in customer scenarios. The optimized library interfaces remain unchanged. Target libraries include system, compression, encryption and decryption, media, and math libraries.
1.System libraries (Glibc-patch, HyperScan, AVX2Neon)
Type | Description |
Glibc-patch | The memory, string, and lock are optimized and accelerated based on the microarchitecture of Huawei Kunpeng 920 processors. |
HyperScan | Kunpeng instructions are used to accelerate the compilation and scanning performance of regular expressions based on the advantages of the Kunpeng micro-architecture. |
AVX2Neon | AVX2Neon is an interface collection library. Kunpeng acceleration instructions are used to adapt to the GCC intrinsic built-in functions defined by the traditional platform, so that applications that use the intrinsic interface of the traditional platform can be smoothly ported to the Kunpeng platform. |
2.Compression libraries (Gzip, ZSTD, Snappy, and KAEzip)
Type | Description |
Gzip | Data prefetch, loop unrolling, and CRC instruction replacement are used based on Gzip-1.10 to improve the compression and decompression speed on the Kunpeng platform, especially the compression and decompression of text files. |
ZSTD | The ZSTD compression and decompression performance on the Kunpeng platform is improved by using NEON instructions, inline assembly, and memory prefetch, adjusting code structure, and optimizing instruction pipeline layout based on zstd-1.4.4. |
Snappy | The Snappy compression and decompression rates on the Kunpeng platform are improved by using inline assembly, high-bit instructions, optimized CPU pipeline, and memory prefetch based on Snappy 1.1.7. |
KAEzip | KAEzip is the compression module of the Kunpeng acceleration engine. It uses the Kunpeng hardware acceleration module to implement the deflate algorithm and works with the lossless user-mode driver framework to provide an interface for high-performance compression in gzip or zlib format. |
3.Encryption and decryption libraries (KAECrypto)
Type | Description |
KAECrypto | The Kunpeng hardware acceleration module implements the RSA, SM3, SM4, DH, MD5, and AES algorithms, provides high-performance symmetric and asymmetric encryption and decryption based on the lossless user-mode driver framework. It is compatible with OpenSSL 1.1.1a and later versions and supports synchronous and asynchronous mechanisms. |
4.Media libraries (HMPP, X265, and x264)
Type | Description |
HMPP | Kunpeng Hyper Media Performance Primitives (HMPP) provides functions for allocating and releasing vector buffers, vector initialization, vector mathematical operations, vector statistics operations, vector sampling and conversion, filtering functions, as well as transform (such as fast fourier transform) functions. It complies with the IEEE 754 (a technical standard for floating-point arithmetic) and can be used on the Kunpeng platform. |
X265 | The underlying x265 transcoding operators are accelerated and optimized by using the Kunpeng vector instruction to improve the transcoding performance in FFmpeg video transcoding scenarios. |
X264 | X264 is a free video encoding software authorized by GPL. It is mainly used for H.264/MPEG-4 AVC video encoding. |
5.Math Libraries (KML_FFT, KML_BLAS, KML_SPBLAS, KML_MATH, KML_VML, KML_LAPACK, KML_SVML, KML_SOLVER)
Type | Description |
KML_FFT | Based on the Kunpeng architecture, KML_FFT deeply optimizes the fast fourier transform (FFT) by using vectorization and algorithm improvement, which greatly improves the performance of the FFT interface functions |
KML_BLAS | Based on the Kunpeng architecture, KML_BLAS performs in-depth mining on the computing efficiency of BLAS by means of vectorization, data prefetch, compilation optimization, and data rearrangement. As a result, the performance of BLAS interface functions approaches the theoretical peak. |
KML_SPBLAS | Based on the instruction set and architecture features of Kunpeng, KML_SPBLAS develops a high-performance sparse matrix operation library to improve the service performance of HPC and big data solutions |
KML_MATH | KML_MATH provides functions with high performance that is based on Kunpeng processors by means of periodic function reduction and algorithm improvement. |
KML_VML | KML_VML performs vectorization on input data by using methods such as NEON instruction optimization and inline assembly. It uses the register features in the Kunpeng architecture to improve the performance of Kunpeng processors. |
KML_LAPACK | KML_LAPACK optimizes the LAPACK based on the Kunpeng architecture by means of block division, algorithm combination, multithreading, and Basic Linear Algebra Subprograms (BLAS) interface optimization, improving the performance of Kunpeng processors. |
KML_SVML | KML_SVML uses methods such as Neon instruction optimization and inline assembly to process input vectors in batches, making full use of the register features in the Kunpeng architecture and improving the performance of the Kunpeng server. |
KML_SOLVER | KML_ SOLVER is a library of iterative sparse solvers that uses preconditioned conjugate gradient (PCG) and generalized conjugate residual (GCR) methods. KML_SOLVER supports multiple threads on a single node. |
Type | Description |
Smart Prefetch | The Smart Prefetch uses high-speed cache drives and efficient prefetch algorithm to improve the system storage I/O performance and overall performance in I/O-intensive scenarios. |
SPDK | The Storage Performance Development Kit (SPDK) aims to improve efficiency and performance by using network, processing, and storage technologies. By running software designed for hardware, SPDK has proven to be easy to achieve millions of I/O reads per second by using many processor cores and NVMe drivers for storage without the need to uninstall additional hardware. |
ISA-L | Intelligent Storage Acceleration Library (ISA-L) is a collection of highly optimized functions that provide RAID, erasure code (EC), cycle redundancy check, cryptographic hash, and compression. |
Type | Description |
XPF | Extensible Packet Framework (XPF) is a Huawei-developed library. The XPF function module implements an intelligent offload engine module in the Open vSwitch (OVS) software. |
DPDK | The Data Plane Development Kit (DPDK) is a data-plane development tool set, including library functions and drivers, for efficient data packet processing in the user space. |
8.HPC
Type | Description |
HMPI | HMPI(Hyper MPI)Hyper MPI is key to the HPC solution. It implements network communication for parallel computing and is applicable to manufacturing, meteorology, supercomputing, AI, and big data |