Blog

Threading, Multiprocessing and Asyncio in Python

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

1. Introduction to Asynchrony
1.1 What is "asynchronous code"?

First of all, let's try to clarify the terminology and understand what is behind the terms synchronous and asynchronous code. And also let's try to figure out what is so bad in synchronous code that everyone is so persistently trying to turn it into asynchronous?

Synchronous code is code in which all instructions are executed strictly sequentially, line by line, and executing the next line of code is possible only if the previous one is completely executed.

The main problem of synchronous code is the requirement to not execute the next instruction until the previous one is completed. This poses a significant challenge for programs that interact with the outside world or other programs during execution since the execution of an instruction may unexpectedly require much more time than usual.

To prevent this from happening, the software code should be mindful of what is happening around it. If the next instruction has the potential to slow down the execution of the main program, it should be parallelized with the execution of other faster instructions or postponed altogether until a more opportune time.

In other words, the task of asynchronous programming is to replace the "mindless" sequential execution of instructions (synchronous) with a "meaningful" change in the order of execution based on the completion time of different instructions (asynchronous).

It is important to emphasize here that it is not necessary to incorporate complex algorithms for estimating the execution time of subsequent instructions into asynchronous code. In the vast majority of cases, it is sufficient to parallelize the execution of problematic sections, move them into a background mode, so that they do not hinder the fast execution of the main program.

Now, (hopefully!) having gained some understanding of asynchrony, it's time to provide a strictly scientific definition to this phenomenon. Fortunately, there are numerous definitions available on the internet, each more obscure than the next 😉. Personally, I found the definition of asynchrony on the Mozilla.org developer website quite appealing:

“Asynchronous programming is a technique that enables your program to start a potentially long-running task and still be able to be responsive to other events while that task runs, rather than having to wait until that task has finished.”

https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Introducing

Thus, asynchrony is what prevents your program from getting "stuck" even when it reaches blocking (or "very long") sections of code, as these code sections will be executed concurrently (or almost concurrently) with the main program.

This is precisely why packages capable of transforming synchronous code into asynchronous code have gained incredible popularity.

Well, at this point, it's probably a good time to go through an example that allows for an even better understanding and consolidation of everything mentioned above.

Let's assume we have two functions (func1() and func2()) that need to be executed sequentially:

import time
N = 5
DELAY = 0.5


def func1(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')


def func2(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')


def main():
   func1(N)
   func2(N)


if __name__ == '__main__':
   start = time.time()
   main()
   print(f'Total time: {time.time() - start}')

The main() function will be the main control function here and onwards, while the functions func1() and func2() are called sequentially within it. Additionally, the total execution time of the main() function, which obviously equals the execution time of the entire script, will be calculated.

In this classic example of synchronous code, it is evident that the control flow will only be passed to the second function (func2()), after the completion of the first function (func1()). It's also fortunate that in our example, the values of the repetition count (N = 5) and the delay time (DELAY = 0.5 seconds) are relatively small, allowing the program to complete within a short time of 5 seconds. But what if these parameters have multiple zeroes at the end? In that case, the execution of func2() may not even be waited for, let alone the appearance of the final completion message for all functions.

It seems that without an asynchronous solution to this problem, someday, in a not-so-pleasant moment, we might find ourselves in a very difficult situation. Therefore, let's try applying one of the three techniques mentioned in the course title, such as threads.

A more detailed explanation of how this and the following examples work will be provided a little later. For now, let's simply enjoy the beauty and ease of transforming synchronous code into asynchronous.

But first, let's add a useful detail to our code, namely a decorator that calculates the runtime of our program. Since all our future tasks will be evaluated in terms of code execution time in one way or another, it makes sense to optimize this calculation from the very beginning. Moreover, it doesn't require knowledge of asynchronous programming methodologies. Our usual "synchronous" knowledge will be sufficient for this purpose.

Many of you probably remember from the basics of the Python language that it is more logical to extract repetitive code within a function and place it separately as a decorator. For those who are not familiar with this concept or may have forgotten, I recommend watching these two videos that cover all four variations of creating decorators:

  • Simplest function decorator for a function (Russian voice): https://youtu.be/394ZfiPJQ38
  • More advanced decorator variations (Russian voice: https://youtu.be/2szgmbn3cYM
So, after the entry point, the code will pass into a decorator placed in a new module within the working directory named my_deco.py:

def time_counter(func):
   @functools.wraps(func)
   def wrap():
       start = time.time()
       print("======== Script started ========")
       func()
       print(f"Time end: {time.strftime('%X')}")
       print(f'======== Script execution time: {time.time() - start:0.2f} ========')


   return wrap

And the previous script is supplemented by importing the new decorator and adding it to the main() function:

import time
from my_deco import time_counter

N = 5
DELAY = 0.5


def func1(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')


def func2(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
   func1(N)
   func2(N)
   print(f'All functions completed')


if __name__ == '__main__':
   main()

Well, now we can add threads. To do that, we need to import them first:

from threading import Thread

And slightly modify the main() function:

@time_counter
def main():
   thread1 = Thread(target=func1, args=(N,))
   thread2 = Thread(target=func2, args=(N,))
   thread1.start()
   thread2.start()
   print(f'All functions completed')

Indeed, the code transformation is minimal, but the result is remarkable!

======== Script started ========
All functions completed
======== Script execution time: 0.01 ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . . . . . . . . .
Process finished with exit code 0

Please note that the line indicating the end of the program execution (======== Script execution time: 0.01 ========), as well as the message indicating the completion of all functions (All functions completed), appear before the information generated by the functions themselves. This confirms that the functions func1() and func2(), which had the potential to block the code, are no longer blocking. Threads allow us to easily "jump over" them and pass control to the code that follows. Consequently, our synchronous code has been transformed into asynchronous code, and its execution time has been reduced from 5 seconds (or even infinity!) to 0.01 seconds.

In conclusion, let's summarize a few observations that will be useful as we further explore threads:

  • The objects thread1 and thread2 represent two threads in which our functions are executed, passed as the target parameter. The corresponding arguments are also passed to these functions using the args parameter. Note that the arguments themselves are passed in tuple format.
  • Creating a thread is similar to defining a function with the def statement. For example, the statement def func means that the function func is only declared but not executed. To execute it, a separate line is required where the function name is invoked with parentheses: func().
  • Similar to function invocation, starting a thread requires a similar approach. Instead of simple parentheses, we use the start() method appended to the thread object.

These observations will be helpful as we delve deeper into the study of threads.

Indeed, threads provide one method to address the issue of code synchronicity. It is evident that such a powerful tool requires further and deeper exploration, which we will undertake in the subsequent lessons.

However, as suggested by the course title, there are at least two more mechanisms or methods to achieve asynchronicity. Does it make sense to divert our attention to learning something else when threads have already allowed us to achieve impressive results?

To find an answer to this question, let's explore the next topic (article).

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic