Steven You

A Forged Geek.

Concurrent Tasks Execution in Python

There are tasks need to be done with multiple thread, e.g.: I need to request thousands of urls, in order to training the collaborative filtering service. This could easily be done using python.

First way: Manage the thread yourself

I have a repo on Github, Tumblr Image Downloader, which is used for batch download images from a tumblr blog using tumblr API.

Basically, there is a task queue:

Liquid error: Could not open library ‘lib.so’: lib.so: cannot open shared object file: No such file or directory

and a worker:

Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module

What the download_img function does is get the image url and save it to the save_path.

The program will call the download_imgs function: Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module

Better and Simpler way: Using concurrent.futures module

PEP 3148 gives the motivation for this module:

Python currently has powerful primitives to construct multi-threaded and multi-process applications but parallelizing simple operations requires a lot of work i.e. explicitly launching processes/threads, constructing a work/results queue, and waiting for completion or some other termination condition (e.g. failure, timeout). It is also difficult to design an application with a global process/thread limit when each component invents its own parallel execution strategy.

This module will make the life easier. Download link is here. There are two types of executor: ThreadPoolExecutor and ProcessPoolExecutor.

I will take ThreadPoolExecutor for example:

Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module

-EOF-

Comments