Config DeepSpeed with Accelerate from scratch
Published:
Step by step, then you will have it.
Problem I was facing
I was installing DeepSpeed for my conda environment. But when I run my code with Accelerate and DeepSpeed, I had following error:
nvcc fatal : unknown option '--threads=8'
The environment I was using:
python 3.10
torch 2.01+cu118
deepspeed 0.14.1
transformers 4.34.0
accelerate 0.29.3
huggingface-hub 0.19.4
nvcc 11.1
Way to find the solution
Obviously, the version of nvcc on my server didn’t match the version that torch was compiled with. And I can’t use sudo command since I’m not a root user on the server.
So, after googling, I found this. Unfortunately, I misunderstood the answer and I first try is to install torch==1.12.1+cu113.
After a few hours hanging out, I came back to my lovely pc and continued google. And I realized that there’s correct pytorch version for CUDA 11.1, which is torch==1.10.1+cu111. But it didn’t support python 3.10!
So I had to create a new environment with python 3.8 and installed torch==1.10.1+cu111 and newest version of DeepSpeed. And I still had the same error. Why is that?
I started to check the source code of DeepSpeed, and the flag --threads=8 was added in its source code without if!
Then I went to DeepSpeed’s Github repo, and it says
- The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA versions.
That’s really confusing to me. And then I found a command to check DeepSpeed environment: ds_report. It printed an important info, deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3.
Now everything is clear! When I first install DeepSpeed with pip, it compiled with current version of torch. So I have to clean all cache and re-install DeepSpeed with correct torch version.
Final Solution
- Run
nvcc -Vto check your cuda version. - Find corresponding version of
torch. - Find the right version of
pythonaccording to thetorch. - Create your environment and install the right
torch. - Run
pip install deepspeedandds_reportto check whetherDeepSpeedis compiled with the right version oftorch. - Done!
