GPU Programming with Python

Chưa phân loại

In this article, we’ll dive into GPU programming with Python. Using the ease of Python, you can unlock the incredible computing power of your video card’s GPU (graphics processing unit). In this example, we’ll work with NVIDIA’s CUDA library.


For this exercise, you’ll need either a physical machine with Linux and an NVIDIA-based GPU, or launch a GPU-based instance on Amazon Web Services. Either should work fine, but if you choose to use a physical machine, you’ll need to make sure you have the NVIDIA proprietary drivers installed, see instructions:

You’ll also need the CUDA Toolkit installed. This example uses Ubuntu 16.04 LTS specifically, but there are downloads available for most major Linux distributions at the following URL:

I prefer the .deb based download, and these examples will assume you chose that route. The file you download is a .deb package but doesn’t have a .deb extension, so renaming it to have a .deb at the end his helpful. Then you install it with:

sudo dpkg -i package-name.deb

If you are prompted about installing a GPG key, please follow the instructions given to do so.

Now you’ll need to install the cuda package itself. To do so, run:

  sudo apt-get update  sudo apt-get install cuda -y  

This part can take a while, so you might want to grab a cup of coffee. Once it’s done, I recommend rebooting to ensure all modules are properly reloaded.

Next, you’ll need the Anaconda Python distribution. You can download that here:

Grab the 64-bit version and install it like this:

sh Anaconda*.sh

(the star in the above command will ensure that the command is ran regardless of the minor version)

The default install location should be fine, and in this tutorial, we’ll use it. By default, it installs to ~/anaconda3

At the end of the install, you’ll be prompted to decide if you wish to add Anaconda to your path. Answer yes here to make running the necessary commands easier. To ensure this change takes place, after the installer finishes completely, log out then log back in to your account.

More info on Installing Anaconda:

Finally we’ll need to install Numba. Numba uses the LLVM compiler to compile Python to machine code. This not only enhances performance of regular Python code but also provides the glue necessary to send instructions to the GPU in binary form. To do this, run:

conda install numba

Limitations and Benefits of GPU Programming

It’s tempting to think that we can convert any Python program into a GPU-based program, dramatically accelerating its performance. However, the GPU on a video card works considerably differently than a standard CPU in a computer.

CPUs handle a lot of different inputs and outputs and have a wide assortment of instructions for dealing with these situations. They also are responsible for accessing memory, dealing with the system bus, handling protection rings, segmenting, and input/output functionality. They are extreme multitaskers with no specific focus.

GPUs on the other hand are built to process simple functions with blindingly fast speed. To accomplish this, they expect a more uniform state of input and output. By specializing in scalar functions. A scalar function takes one or more inputs but returns only a single output. These values must be types pre-defined by numpy.

Example Code

In this example, we’ll create a simple function that takes a list of values, adds them together, and returns the sum. To demonstrate the power of the GPU, we’ll run one of these functions on the CPU and one on the GPU and display the times. The documented code is below:

  import numpy as np  from timeit import default_timer as timer  from numba import vectorize    # This should be a substantially high value. On my test machine, this took  # 33 seconds to run via the CPU and just over 3 seconds on the GPU.  NUM_ELEMENTS = 100000000    # This is the CPU version.  def vector_add_cpu(a, b):    c = np.zeros(NUM_ELEMENTS, dtype=np.float32)    for i in range(NUM_ELEMENTS):      c[i] = a[i] + b[i]    return c    # This is the GPU version. Note the @vectorize decorator. This tells  # numba to turn this into a GPU vectorized function.  @vectorize(["float32(float32, float32)"], target='cuda')  def vector_add_gpu(a, b):    return a + b;    def main():    a_source = np.ones(NUM_ELEMENTS, dtype=np.float32)    b_source = np.ones(NUM_ELEMENTS, dtype=np.float32)      # Time the CPU function    start = timer()    vector_add_cpu(a_source, b_source)    vector_add_cpu_time = timer() - start      # Time the GPU function    start = timer()    vector_add_gpu(a_source, b_source)    vector_add_gpu_time = timer() - start      # Report times    print("CPU function took %f seconds." % vector_add_cpu_time)    print("GPU function took %f seconds." % vector_add_gpu_time)      return 0    if __name__ == "__main__":    main()  

To run the example, type:


NOTE: If you run into issues when running your program, try using “conda install accelerate”.

As you can see, the CPU version runs considerably slower.

If not, then your iterations are too small. Adjust the NUM_ELEMENTS to a larger value (on mine, the breakeven mark seemed to be around 100 million). This is because the setup of the GPU takes a small but noticeable amount of time, so to make the operation worth it, a higher workload is needed. Once you raise it above the threshold for your machine, you’ll notice substantial performance improvements of the GPU version over the CPU version.


I hope you’ve enjoyed our basic introduction into GPU Programming with Python. Though the example above is trivial, it provides the framework you need to take your ideas further utilizing the power of your GPU.

Sandclock IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, Sandclock IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Using Google Search API With Python

It is no news that Google is the largest search engine in the world. Lots of people will go the extra mile to have their...

VLC Media Player for Linux

VLC is a free and open source cross platform media player developed by the VideoLAN non profit organization. VLC media player...

5 Cheap Raspberry Pi Alternatives in 2020

The Raspberry Pi is the king of single-board computers because it offers decent performance packed into a convenient form-factor...
Bài Viết

Bài Viết Mới Cập Nhật

Hướng dẫn chuyển đổi windows server windows evaluation to standard và active windows server 2008 + 2012 + 2016 + 2019

How to Update Ubuntu Linux

Squid Proxy Manager cài đặt và quản lý Proxy Squid tự động trên ubuntu

Hướng dẫn cài đặt Apache CloudStack

Hướng dẫn ký file PDF bằng chữ ký số (chữ ký điện tử) và sửa lỗi mới nhất 2021 foxit reader