pip install intel-extension-for-pytorch==2.7.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
Intel® Extension for PyTorch* v2.7.10+xpu. This is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch* 2.7.0.
Highlights
- Intel® oneDNN v3.7.1 integration
- Large Language Model (LLM) optimization
- Intel® Extension for PyTorch* optimizes typical LLM models like Llama 2, Llama 3, Phi-3-mini, Qwen2, and GLM-4 on the Intel® Arc™ Graphics family. Moreover, new LLM inference models such as Llama 3.3, Phi-3.5-mini, Qwen2.5, and Mistral-7B are also optimized on Intel® Data Center GPU Max Series platforms compared to the previous release. A full list of optimized models can be found in the LLM Optimizations Overview, with supported transformer version updates to 4.48.3.
- Serving framework support
- Intel® Extension for PyTorch* offers extensive support for various ecosystems, including vLLM and TGI, with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features, such as chunked prefill and MoE (Mixture of Experts), are supported by the backend kernels provided in Intel® Extension for PyTorch*. In this release, Intel® Extension for PyTorch* adds sliding windows support in ipex.llm.modules.PagedAttention.flash_attn_varlen_func to meet the need of models like Phi3, and Mistral, which enable sliding window support by default.
- [Prototype] QLoRA/LoRA finetuning using BitsAndBytes
- The performance of the NF4 dequantize kernel has been improved by approximately 4.4× to 5.6× across different shapes compared to the previous release.
- _int_mm support in INT8 has been added to enable INT8 LoRA finetuning in PEFT (with float optimizers like adamw_torch).
 
- Intel® Extension for PyTorch* supports QLoRA/LoRA finetuning with BitsAndBytes on Intel® GPU platforms. This release includes several enhancements for better performance and functionality:
- Codegen support removal
- Removes codegen support from Intel® Extension for PyTorch* and reuses the codegen capability from Torch XPU Operators, to ensure interoperability of code change in codegen with usages in Intel® Extension for PyTorch*.
- [Prototype] Python 3.13t support
- Adds prototype support for Python 3.13t and provides prebuilt binaries on the download server.
