Config DeepSpeed with Accelerate from scratch
Published:
Step by step, then you will have it.
Problem I was facing
I was installing DeepSpeed
for my conda environment. But when I run my code with Accelerate
and DeepSpeed
, I had following error:
nvcc fatal : unknown option '--threads=8'
The environment I was using:
python 3.10
torch 2.01+cu118
deepspeed 0.14.1
transformers 4.34.0
accelerate 0.29.3
huggingface-hub 0.19.4
nvcc 11.1
Way to find the solution
Obviously, the version of nvcc
on my server didn’t match the version that torch
was compiled with. And I can’t use sudo
command since I’m not a root user on the server.
So, after googling, I found this. Unfortunately, I misunderstood the answer and I first try is to install torch==1.12.1+cu113
.
After a few hours hanging out, I came back to my lovely pc and continued google. And I realized that there’s correct pytorch
version for CUDA 11.1
, which is torch==1.10.1+cu111
. But it didn’t support python 3.10
!
So I had to create a new environment with python 3.8
and installed torch==1.10.1+cu111
and newest version of DeepSpeed
. And I still had the same error. Why is that?
I started to check the source code of DeepSpeed
, and the flag --threads=8
was added in its source code without if
!
Then I went to DeepSpeed
’s Github repo, and it says
- The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA versions.
That’s really confusing to me. And then I found a command to check DeepSpeed
environment: ds_report
. It printed an important info, deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3
.
Now everything is clear! When I first install DeepSpeed
with pip
, it compiled with current version of torch
. So I have to clean all cache and re-install DeepSpeed
with correct torch
version.
Final Solution
- Run
nvcc -V
to check your cuda version. - Find corresponding version of
torch
. - Find the right version of
python
according to thetorch
. - Create your environment and install the right
torch
. - Run
pip install deepspeed
andds_report
to check whetherDeepSpeed
is compiled with the right version oftorch
. - Done!