There are tasks need to be done with multiple thread, e.g.: I need to request thousands of urls, in order to training the collaborative filtering service. This could easily be done using python.
First way: Manage the thread yourself
I have a repo on Github, Tumblr Image Downloader, which is used for batch download images from a tumblr blog using tumblr API.
Basically, there is a task queue:
Liquid error: Could not open library ‘lib.so’: lib.so: cannot open shared object file: No such file or directory
and a worker:
Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module
What the download_img function does is get the image url and save it to the save_path.
The program will call the download_imgs function: Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module
Better and Simpler way: Using concurrent.futures module
PEP 3148 gives the motivation for this module:
Python currently has powerful primitives to construct multi-threaded and multi-process applications but parallelizing simple operations requires a lot of work i.e. explicitly launching processes/threads, constructing a work/results queue, and waiting for completion or some other termination condition (e.g. failure, timeout). It is also difficult to design an application with a global process/thread limit when each component invents its own parallel execution strategy.
This module will make the life easier. Download link is here. There are two types of executor: ThreadPoolExecutor and ProcessPoolExecutor.
I will take ThreadPoolExecutor for example:
Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module
-EOF-