Блог

Threading, Multiprocessing and Asyncio in Python (Part 2)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

1. Introduction to Asynchrony
1.2 Comparison of three ways to get asynchronous code

The easiest way to understand the working principle of all three methods of achieving asynchronous code is through the following diagram:

All three methods listed (threads, multiprocessing, and the asyncio package) essentially accomplish the same thing: they allow the execution of the main program to run in parallel mode (represented by the green and blue lines). In other words, certain (often "problematic") sections of code start executing as if simultaneously and independently from each other. If one or even several branches of this parallel process take a long time or stop altogether, it won't affect the main program - it will continue working in its regular mode.

Please note that the term "as if" is used here intentionally - parallel computations are not always truly parallel. Sometimes, the switch between different branches of the process happens so quickly that they appear parallel to an external observer. It's similar to how 24 static frames displayed within one second create the illusion of continuous motion on a movie screen.

Indeed, this is the fundamental difference between the multiprocessing package, where computations truly occur in parallel, and the other two packages (threading and asyncio), where the effect of parallelism is achieved through fast switching between several "independent" parts of the program within a single process.

The technology of real parallel computation is called parallelism, while the technology of simulating parallel computations through fast switching is referred to as concurrency.

Parallel computations, in terms of resource usage, are not cheap because they involve multiple (or even all!) processor cores, additional RAM, and so on. Therefore, such a solution is justified only in the case of complex computational tasks where continuous and uninterrupted CPU processing is necessary (CPU-bound tasks).

In practice, however, we often deal with relatively slow processes such as database queries, network interactions, and so on. These are known as IO-bound tasks, in which the processor, while sending a request to a relatively slow external resource, is forced to idle and wait for a response.

In this case, it makes perfect sense to utilize the idle time of the processor for something more useful. For this purpose, two other technologies are used (threading and asyncio), where the waiting time for a "sleeping task" is used to perform other tasks.

It is important to note that machine resources are used more efficiently in this case - we no longer create new processes but instead use resources within the context of a single process.

Here, it is worth highlighting the fundamental technological difference between the threading and asyncio packages.

In the case of threads (using the threading package), the Python interpreter relies on the operating system (OS) for additional assistance. When creating a new thread, it essentially tells the OS, "Now I need the main thread's task to be executed simultaneously with the task of the new thread until this new thread finishes." In this case, the OS switches back and forth between the two tasks at strictly equal time intervals. The switching takes only fractions of a second, so from an external observer's perspective, both tasks appear to be executed in parallel and simultaneously.

The advantage of this method is evident: the Python code itself for each thread remains completely unchanged (referring to the function passed as the target parameter to the thread). The thread itself is simply an instance of the Thread class, and its control is managed using the start() and join() methods (from the language's syntax perspective, there is nothing fundamentally new!).

There are indeed some drawbacks to using threads:

  • Thread-related data needs to be stored, which requires additional memory resources.
  • The context switching between threads during data reading/writing also takes time. The more threads are involved, the more noticeable this becomes.
  • Thread management is handled by the operating system, not the Python interpreter. Therefore, thread switching occurs based on the OS's scheduling algorithm, which may not always be optimal in terms of prioritizing the execution of specific threads.

These factors can impact the overall performance and efficiency of threaded code.

All the aforementioned drawbacks are absent in the asyncio package. Here, only one thread is used within a single process, of course. Everything would be fine if it weren't for one significant drawback: applying this method requires its own separate and fundamentally new code, which differs significantly from the familiar syntax of the Python language.

However, judge for yourself - here, for example, is what the solution to the previous task would look like using the asyncio package:

import time
import asyncio
from my_deco import async_time_counter


N = 5
DELAY = 0.5




async def func1(n):
   for i in range(n):
       await asyncio.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')




async def func2(n):
   for i in range(n):
       await asyncio.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')




@async_time_counter
async def main():
   print(f'All functions completed')




async def run():
   task0 = asyncio.create_task(main())
   task1 = asyncio.create_task(func1(N))
   task2 = asyncio.create_task(func2(N))
   await task0
   await task1
   await task2


if __name__ == '__main__':
   asyncio.run(run())

The result of executing this script will be exactly the same as in the previous example with threads: control is immediately passed back to the main program main(). And since the code in main() only contains one print statement, this function completes almost instantly, and the result of the two other functions' actions becomes visible after the main() program has finished:

======== Script started ========
All functions completed
======== Script execution time: 0.00 ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . . . . . . . . .
Process finished with exit code 0

It is reasonable for beginners who have just learned the basics of Python to ask, "In which language is this code written?" In fact, there is nothing surprising about this question because:

  1. Function definitions use the new async keyword.
  2. Inside these functions, an unfamiliar operator await is used.
  3. Strictly speaking, the use of these two operators transforms the functions into a completely different entity in the Python language - they are no longer regular functions but coroutines.
  4. Instead of the familiar time delay using time.sleep(DELEY), its "asynchronous" counterpart asyncio.sleep(DELEY) is used, which not only introduces a delay but also includes a control element that switches execution from the current function (or should we say, current coroutine) to another.
  5. The previous decorator @time_counter also cannot work here because the coroutine main() cannot be simply invoked with parentheses like the function main(). This peculiarity needs to be taken into account when defining a new decorator, @async_time_counter.
  6. Finally, running this code using the regular approach is no longer possible - it requires a special construct like asyncio.run().

As a result, it turns out that this approach also has its own downsides, and quite significant ones at that.

Thus, a brief overview of the three methods (technologies) for creating asynchronous code has shown that none of the discussed options has universal advantages over the others. Each has its own merits and drawbacks. Therefore, all three methods have prospects for further development, improvement, and practical usage.

Hence, the answer to the question "Which option to choose?" is surprisingly straightforward: the one that best suits the specific requirements of your current task. Of course, it is crucial to have an equally good command of each of the listed technologies to be able to make the optimal choice at the right moment, rather than simply relying on familiarity.

It's natural to wonder, "Are there any other ways to make code asynchronous besides the three mentioned in the course title?"

The answer is, undoubtedly, yes.

The subprocess package allows for the creation of additional processes in which various programs can be executed, including Python code.

The concurrent.futures package provides a convenient interface for asynchronous tasks and parallel computing. It abstracts away the details of thread or process creation and management, making it more preferable in simple scenarios where ease of use is important and direct control over threads or processes is not required. However, for more complex scenarios or lower-level control, the threading and multiprocessing modules can provide greater flexibility.

In addition to the packages included in the Python standard library, there are other well-known packages that are not part of it. For example, the packages curio
( https://curio.readthedocs.io/ ) and trio
( https://trio.readthedocs.io/ ) are used for working with coroutines.

The examples mentioned above can be classified as universal packages capable of making almost any synchronous code asynchronous. In addition to these, there are also specialized packages that enable achieving asynchronicity for specific programs and applications. For instance, the select package is used to facilitate asynchronous socket operations (via the socket package).

Furthermore, within the socket package itself, there are separate "asynchronous" methods and functions that are part of the regular "synchronous" package.

Certainly, our course focuses on the three foundational "pillars" of asynchronicity mentioned in the title. They form the basis and foundation of this programming approach. However, the topic of asynchronicity in Python would not be fully explored without at least a brief overview of some additional packages and methods mentioned earlier. This will be the subject of the final, fifth lesson of this course.

So, let's proceed to a detailed study of the three main packages listed in the course title. We will begin this exploration with threads (the threading package).

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic