Blog

How to Link Local and Remote Repositories

Prerequisites

The article is primarily addressed to those who have already

  • created ssh keys on my local machine
  • added them to my GitHub account
  • and has an urgent need to regularly save changes in his projects to a remote repository.
  • If you are not yet ready to boldly cross out all the points listed above, then it is recommended that you first read this article:

Everyone else is offered 2 ways to link a local repository with a remote one:

  1. Create a remote repository on GitHub and link it to the previously created local repository
  2. Create a remote repository, clone it to your local machine and transfer your project there

In both cases, you will need to create a new repository.

GitHub: Set up default branch and create new repository

First of all, it is strongly recommended to make your life easier in the future and return the default name of the main branch back to master. This will allow you to avoid confusion in names and unnecessary errors in further work. Because the name master is the default branch name in newly created local repositories.

To do this, in GitHub, click on the icon of your avatar in the upper right corner, then select "Settings".

On the page that opens, in the "Code, planning, and automation" section select "Repositories" and in the "Repository default" section branch" (Default repository branch) replace the value of main with master. (And, of course, don’t forget to click the Update button afterwards!)

Now you can create new repositories. To do this, we go to the reppository list and press the New button. It is not at all necessary to select any options (especially in the first case!). This procedure is discussed in more detail in the video (link at the end of the article).

1. Create a remote repository on GitHub and link it to the previously created local repository

If you chose this method, it means that your local machine already has a working folder, into which you have already entered in the terminal

git init

And perhaps it even has its own commit history.

In this case, copy the SSH address of the created repository and enter the command in the local repository terminal:

git remote add origin <your repository name>

If there were any changes to the project, you need to write them to the local repository:

git add --all && git commit -m "your commit"

And after that add all this information to the deleted one:

git push -fu origin master

Important! The -f prefix is used if you have already written something to a remote repository, but you do not need this information. Therefore, writing this update to the local repository will completely overwrite all information in the remote repository.

That's all, actually - the remote repository is now linked to the local one and ready to go!

2. Create a remote repository, clone it to your local machine and transfer your project there

This option is somewhat simpler to implement and is ideal for those who have not yet switched from git to the basics.

To implement it, you will need to select in the terminal window the location where the local repository will be located, and then enter only one command:

git clone <your repository name>

The initialization of the local repository and its binding to the remote one are successfully completed and you can start working. The only addition: to create commits after making changes to the project, you will first need to “go down” 1 level down - to the project folder (see video)

You can find out more about all the information presented in the article in this video (RU voice):

Read more >>

Threading, Multiprocessing and Asyncio in Python (Part 6)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

4. Asyncio
4.2 Coroutines, Tasks and Event Loop

An equivalent of generators in asyncio is a coroutine - a function whose execution can be suspended and resumed. As we can see, generators perfectly fit the definition of coroutines, with one exception: a coroutine has the awaitable property - the ability to be used with the await operator.

To run coroutines, we need an Event Loop - a crucial component in asyncio that executes asynchronous tasks. The asyncio package provides a comprehensive set of low-level methods and functions for event loops. However, it's important to note that for most practical purposes, using the high-level function asyncio.run() is more than sufficient.
https://docs.python.org/3/library/asyncio-eventloop.html#event-loop

Let's consider a simple example:

import asyncio


async def cor1():
    print('cor1')


asyncio.run(cor1())

All attempts to run the coroutine cor1() as a regular function will result in an error: RuntimeWarning: coroutine 'cor1' was never awaited. This indicates that running a coroutine means notifying the event loop that the coroutine is ready to be executed by using the await operator. In other words, by using the await cor1() construction, we are informing the event loop something along the lines of: "We are ready to execute the coroutine cor1() at this point in the program. Please do it at the earliest opportunity".

In our previous example, the asyncio.run() function already implicitly contains this operator, so the code runs without an error. However, we can explicitly run a coroutine from another coroutine:

import asyncio


async def cor1():
    print('cor1')


async def main():
    await cor1()
    print('The cor1 coroutine has been started.')


asyncio.run(main())

Let's make the code more complex and try to run two coroutines in parallel. To do this, we will set different delays in each coroutine and observe how well asyncio handles this task.

By the way, note that the delay in our example is not taken from the time package but from the asyncio package. The explanation is as follows: in principle, we can use any function inside a coroutine. However, if we want to avoid blocking within the coroutine, this function must have the awaitable property.

import time
import asyncio
from my_deco import async_time_counter


async def cor1(start):
    print(f'cor1 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(1)
    print(f'cor1 after with delay: {time.time() - start:0.0f}')


async def cor2(start):
    print(f'cor2 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(2)
    print(f'cor2 after with delay: {time.time() - start:0.0f}')


@async_time_counter
async def main():
    start = time.time()
    await cor1(start)
    await cor2(start)
    print(f'and all with delay: {time.time() - start:0.0f}')


asyncio.run(main())

As we can see, the trick didn't work out - the coroutines executed sequentially instead of in parallel:

======== Script started ========
cor1 before with delay: 0
cor1 after with delay: 1
cor2 before with delay: 1
cor2 after with delay: 3
and all with delay: 3
======== Script execution time: 3.00 ========

This happened because, for parallel execution of coroutines, it is not enough for all the functions inside the coroutines to be awaitable objects. The coroutines themselves need to be passed to the event loop not as coroutines but as tasks. There are several ways to do this.

Firstly, you can do it explicitly by converting the coroutine into a task using the asyncio.create_task() function:

import time
import asyncio
from my_deco import async_time_counter


async def cor1(start):
    print(f'cor1 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(1)
    print(f'cor1 after with delay: {time.time() - start:0.0f}')


async def cor2(start):
    print(f'cor2 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(2)
    print(f'cor2 after with delay: {time.time() - start:0.0f}')


@async_time_counter
async def main():
    start = time.time()
    task1 = asyncio.create_task(cor1(start))
    task2 = asyncio.create_task(cor2(start))
    await task1
    await task2
    print(f'and all with delay: {time.time() - start:0.0f}')


asyncio.run(main())

As you can see, congratulations are in order - now the overall script execution time has been reduced to the time taken by the "longest" coroutine:

======== Script started ========
cor1 before with delay: 0
cor2 before with delay: 0
cor1 after with delay: 1
cor2 after with delay: 2
and all with delay: 2
======== Script execution time: 2.00 ========

Secondly, you can achieve the same result implicitly using the fantastic function asyncio.gather(), which automatically converts all the coroutines passed as arguments into tasks:

import time
import asyncio
from my_deco import async_time_counter


async def cor1(start):
    print(f'cor1 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(5)
    print(f'cor1 after with delay: {time.time() - start:0.0f}')


async def cor2(start):
    print(f'cor2 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(2)
    print(f'cor2 after with delay: {time.time() - start:0.0f}')


@async_time_counter
async def main():
    start = time.time()
    """
    From tutorial: 
        awaitable asyncio.gather(*aws, return_exceptions=False)
    "If any awaitable in aws is a coroutine, it is automatically scheduled as a Task."
    https://docs.python.org/3/library/asyncio-task.html#asyncio.gather    
    """
    await asyncio.gather(cor1(start), cor2(start))
    print(f'and all with delay: {time.time() - start:0.0f}')


asyncio.run(main())

And thirdly, in Python version 3.11, there is a more elegant way to convert coroutines into tasks using asyncio.TaskGroup():

import time
import asyncio
from my_deco import async_time_counter


async def cor1(start):
    print(f'cor1 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(5)
    print(f'cor1 after with delay: {time.time() - start:0.0f}')


async def cor2(start):
    print(f'cor2 before with delay: {time.time() - start:0.0f}')
    await asyncio.sleep(2)
    print(f'cor2 after with delay: {time.time() - start:0.0f}')


@async_time_counter
async def main():
    start = time.time()
    async with asyncio.TaskGroup() as tg:
        task1 = tg.create_task(cor1(start))
        task2 = tg.create_task(cor2(start))
    print(f'and all with delay: {time.time() - start:0.0f}')


asyncio.run(main())

Thus, we have discovered that the third crucial element of the asyncio package is the Task. Tasks enable coroutines to be executed concurrently, or more precisely, quasi-parallel.
https://docs.python.org/3/library/asyncio-task.html#:~:text=a%20coroutine%20function.-

In conclusion, summarizing everything discussed in this topic, we note that the minimum set for creating asynchronous code using the asyncio package includes:
  • Coroutines - awaitable functions whose execution can be suspended and resumed.
  • Tasks - coroutines that are given the ability to be executed quasi-parallel (concurrently).
  • Event Loop, which organizes and dispatches the quasi-parallel execution of tasks and coroutines.

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic 🔒




Read more >>

Threading, Multiprocessing and Asyncio in Python (Part 5)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

4. Asyncio Package
4.1 Generator as an asynchronous function

And finally, we come to the third way of creating asynchronous code, where all the program code is contained within not only the same process but also the same thread (see the diagram in the introduction).

In the previous two cases (threading and multiprocessing packages), there were no specific requirements for the source code. To turn this code into asynchronous, we simply took a blocking (or "slow" function) and placed it in a separate thread or process. And we did this without any changes to the original function, as we placed these functions in a separate process or thread managed by the operating system.

However, when we attempt to achieve asynchronicity within the same process and thread, we can no longer rely on the assistance of the operating system. We are left to rely on ourselves, which means we cannot avoid making significant changes to the original source code.

Armed with this idea, let's once again recall those two "slow" functions from our very first example at the beginning of this course:

import time
from my_deco import time_counter
N = 5
DELAY = 0.5


def func1(n):
    for i in range(n):
        time.sleep(DELAY)
        print(f'--- line #{i} from {n} is completed')


def func2(n):
    for i in range(n):
        time.sleep(DELAY)
        print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
    func1(N)
    func2(N)
    print(f'All functions completed')


if __name__ == '__main__':
    main()

As we already know well, when we call the function func1(n), the further execution of the main program will be suspended until this function completes all its iterations. Only after that, control will move to the next line of code.

In other words, a regular function has the property of blocking the execution of the main code from the moment of its invocation until its complete completion.

However, in Python, there is a wonderful object called a generator, which can also be considered as a kind of function. But it's a function without blocking. It's a function that can be executed "partially" or "step-by-step." Each time it is called, it doesn't complete its execution but only advances by "one step," one iteration, and no more. However, it remembers its state, the current step it stopped at, so that it doesn't repeat itself and can continue its work from the next step.

The generator is incredibly popular in Python, so there is no doubt that most readers are very familiar with what it is. Nevertheless, it is still worth saying a few introductory words on this topic.

Generators in Python

Below is an example of a generator function gen():

def gen(seq: iter):
    for item in seq:
        yield item


def main():
    data = [0, 1, 2, 3]

    for i in gen(data):
        print(i)


if __name__ == '__main__':
    main()

In this case, the yield statement serves as the exact stopping point where the generator temporarily suspends its execution and resumes it upon the next call.

Therefore, you cannot simply run the generator like a regular function once and wait for the result. The generator needs to be continuously managed. This is precisely what the main() function does in our case.

In this example, the generator's data is extracted using a loop. This is perhaps the simplest way to work with a generator. However, for our case, this approach is not entirely suitable because the loop strictly retrieves all the elements of the generator in sequential order. As a result, this construction (generator + its management from the main() function) ends up behaving similar to a loop in a regular (blocking) function.

Hence, we will utilize the __next__() method (or the next() function), which allows for arbitrary access to the generator:

def gen(seq: iter):
    for item in seq:
        yield item


def main():
    data = [0, 1, 2, 3]

    while True:
        print(next(gen(data)))


if __name__ == '__main__':
    main()

However, in this case, we end up with an infinite loop where the generator returns the same initial value of 0 every time. To fix this, the generator needs to be initialized first.

Initialization of the generator is done by calling the function that contains the yield keyword. When the generator function is called in the code, it doesn't execute immediately but returns a generator object. This object can be used to iterate over the sequence of values generated by the generator function:

def gen(seq: iter):
    for item in seq:
        yield item


def main():
    data = [0, 1, 2, 3]

    # initialization
    g = gen(data)

    while True:
        print(next(g))


if __name__ == '__main__':
    main()

Well, you're almost there. However, after exhausting all the values from the generator, a StopIteration exception is raised, which would make sense to catch:

def gen(seq: iter):
    for item in seq:
        yield item


def main():
    data = [0, 1, 2, 3]

    # initialization
    g = gen(data)

    while True:
        try:
            print(next(g))
        except StopIteration:
            print('the generator is exhausted')
            break


if __name__ == '__main__':
    main()

Well, there you have it. Everything is in order now - we have complete control over the process of extracting values from the generator. And if needed, we can sequentially extract values from multiple generator functions, which externally appears as parallel execution of these functions.

To conclude this brief overview of generator topic, let's add two final touches:
  1. The loop in the generator function gen() can be written much more compactly: yield from seq.
  2. The iterator in the form of a list [0, 1, 2, 3] that is passed to the generator can be written more compactly as the range object: range(4).
Here's the updated code, taking into account the two last additions:

def gen(seq: iter):
    yield from seq


def main():
    data = range(4)  # [0, 1, 2, 3] (not equal, but about the same in your case!)

    # initialization
    g = gen(data)

    while True:
        try:
            print(next(g))
        except StopIteration:
            print('the generator is exhausted')
            break


if __name__ == '__main__':
    main()

Replacing Blocking Functions with Generators

As we just learned from the previous section, it's not enough to replace functions with generators, we also need to manage these generators.

Thus, there arises a need for another dispatcher function, called main(), which controls the execution of generator functions. It can also be referred to as an Event Loop since each event of receiving a new value from a generator is born within the depths of the event loop.

If there are two or more generators, the task for the event loop becomes slightly more complex since each generator needs to be called in turn.

def gen(seq: iter):
    yield from seq


def main():
    data1 = range(5)
    data2 = data1

    g1 = gen(data1)
    g2 = gen(data2)

    while True:
        try:
            print(next(g1))
            print(next(g2))
        except StopIteration:
            print('the generators are exhausted')
            break


if __name__ == '__main__':
    main()

This code already bears a strong resemblance to our recent example with threads , as the generator functions g1() and g2() behave in a similar way in our example: they no longer block the execution of the main program until they are completed. Therefore, both generator functions now run in parallel.

However, in this example, the event loop appears to be somewhat simplified, as it does not take into account that the generators can yield sequences of different lengths. Below is an adjusted version that addresses this issue:

def gen(seq: iter):
    yield from seq


def main():
    data1 = range(5)
    data2 = range(15, 18)

    g1 = gen(data1)
    g2 = gen(data2)
    g1_not_exhausted = True
    g2_not_exhausted = True

    while g1_not_exhausted or g2_not_exhausted:
        if g1_not_exhausted:
            try:
                print(next(g1))
            except StopIteration:
                print('the generator 1 is exhausted')
                g1_not_exhausted = False

        if g2_not_exhausted:
            try:
                print(next(g2))
            except StopIteration:
                print('the generator 2 is exhausted')
                g2_not_exhausted = False

Now we can refactor our initial example where regular functions func1() and func2() will be transformed into generators gen1() and gen2():

import time
from my_deco import time_counter

N = 5
DELAY = 0.5


def gen1(n):
    for i in range(n):
        yield
        time.sleep(DELAY)
        print(f'--- line #{i} from {n} is completed')


def gen2(n):
    for i in range(n):
        yield
        time.sleep(DELAY)
        print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
    g1 = gen1(N)
    g2 = gen2(N)
    g1_not_exhausted = True
    g2_not_exhausted = True

    while g1_not_exhausted or g2_not_exhausted:
        if g1_not_exhausted:
            try:
                next(g1)
            except StopIteration:
                print('the generator 1 is exhausted')
                g1_not_exhausted = False

        if g2_not_exhausted:
            try:
                next(g2)
            except StopIteration:
                print('the generator 2 is exhausted')
                g2_not_exhausted = False


if __name__ == '__main__':
   main()

Now, this code even more closely resembles the previous example with threads, as the modified functions func1() and func2() (transformed into generators gen1() and gen2()) are effectively executed in parallel. However, there is one caveat: each function still contains a blocking delay of 2 seconds. To solve this problem, we can utilize the asyncio package.

But before we dive into writing our first asynchronous script using this package, we need to familiarize ourselves with its fundamental components: Coroutines, Tasks, and the Event Loop.

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic




Read more >>

Threading, Multiprocessing and Asyncio in Python (Part 4)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

2. Threading
2.2 Threading. Synchronization primitives: Lock

The threads we have used so far did not interact with each other or the main thread. All they did was simply print their own results.

However, in practice, fully autonomous threads are more of an exception than the rule. More often, threads need to exchange data with each other or collectively use (modify) data that is in the main thread. In this case, there is an objective need to synchronize the actions of these threads.

It is particularly important to note the following: the use of synchronization primitives itself does not make asynchronous code synchronous (unless, of course, we are talking about programmer errors 😉). Synchronization primitives only synchronize individual threads (or individual processes in the case of the multiprocessing package, or individual coroutines in the case of the asyncio package) but by no means turn asynchronous code into synchronous!

Let's consider the simplest example of such interaction - simultaneous shared access of multiple threads to a single variable from the main thread.

As seen from the following example, multiple threads increment the value of the shared variable val within a loop:

from threading import Thread
from my_deco import time_counter

val = 0
COUNT = 100
NUM_THREADS = 100

def increase():
   global val
   for _ in range(COUNT):
       val += 1

@time_counter
def main():
   threads = [Thread(target=increase) for _ in range(NUM_THREADS)]

   for thread in threads:
       thread.start()
   for thread in threads:
       thread.join()


   diff = val - COUNT * NUM_THREADS
   print(f'diff = {diff}')

if __name__ == '__main__':
   main()

If it were not for threads, this construction could be considered as two nested loops:

  • the loop inside the thread as the inner loop,
  • and the threads themselves as the outer loop.
Based on this, the final value of the variable val should be equal to the product of the number of iterations of the two loops, i.e., the number of threads multiplied by the number of inner loops (in our case, it would be 100 * 100 = 10,000).

This would indeed be the case if the += operation were thread-safe. However, in reality, this is far from true.

First and foremost, it is important to note that a single line of code actually represents 4 sequential actions:

  1. Retrieving the current value of the variable val.
  2. Retrieving the value of the increment (in our case, it is 1).
  3. Adding the two numbers together (val + 1).
  4. Writing the result as the new value of the variable val.

Therefore, there is a non-zero probability that between steps 1 and 4, another thread may interleave with a different value of val. As a result, when overwriting this variable in step 4, the value that was increased in one of these threads will be lost. This phenomenon is known as a 'race condition.'

In our example, it is quite difficult to observe this effect since the values of the initial variables, namely the number of threads COUNT, the number of iterations NUM_THREADS, and the thread-switching interval, are not sufficient for the consistent manifestation of this effect.

By the way, the default thread-switching interval can be obtained using the getswitchinterval() method from the well-known sys package.

import sys


interval = sys.getswitchinterval()
print(f'switchinterval = {interval}')


# switchinterval = 0.005

We can modify the value of the thread-switching interval using the sys.setswitchinterval(new_interval) method, but unfortunately, we cannot decrease it to the level where the race condition effect will manifest. However, we can programmatically modify our code to slow down the increment of the val value. To achieve this, we will separate

  • the calculation of the new value of the variable val
  • and the replacement of the old value with the new one.
To make it more convincing, we will add a delay of 0.001 seconds between these two calculations:

import time
from threading import Thread
from my_deco import time_counter

val = 0
COUNT = 100
NUM_THREADS = 100

def increase():
   global val
   for _ in range(COUNT):
       new_val = val + 1
       time.sleep(0.001)
       val = new_val


@time_counter
def main():
   threads = [Thread(target=increase) for _ in range(NUM_THREADS)]

   for thread in threads:
       thread.start()
   for thread in threads:
       thread.join()

   diff = val - COUNT * NUM_THREADS
   print(f'diff = {diff}')


if __name__ == '__main__':
   main()

In this case, the difference diff will be significantly different from zero.

Thus, the "thread-UNsafe" access to shared variables is proven. How can we fix this situation?

For this purpose, Python provides so-called synchronization primitives to address these issues.

Perhaps the simplest, basic, and most commonly used synchronization primitive is the Lock() object, which operates according to the following algorithm:

  1. Before allowing a thread to start modifying data, it checks if another thread has already started this modification.
  2. If modifications have already been initiated by another thread, the current thread is put into a queue.
  3. When the queue reaches the waiting thread, it gains access to modify the data, while simultaneously preventing any other thread from making changes - using the acquire() method.
  4. After completing the modifications, the current thread releases the lock, and the right to make changes is passed to the next thread in the queue - using the release() method.

import time
from threading import Thread, Lock
from my_deco import time_counter

COUNT = 100
NUM_THREADS = 100
val = 0
val_lock = Lock()


def increase():
   global val
   for _ in range(COUNT):
       val_lock.acquire()
       new_val = val + 1
       time.sleep(0.001)
       val = new_val
       val_lock.release()


@time_counter
def main():
   threads = [Thread(target=increase) for _ in range(NUM_THREADS)]


   for thread in threads:
       thread.start()
   for thread in threads:
       thread.join()

   diff = val - COUNT * NUM_THREADS
   print(f'diff = {diff}')


if __name__ == '__main__':
   main()

As we can see, the application of the Lock object has produced the expected results.

(By the way, sometimes this type of lock is referred to as a Mutex - a synchronization primitive used to protect shared resources from simultaneous access by multiple threads. It represents a lock that a thread can hold or release. Only one thread can hold the mutex at any given time, and all other threads attempting to acquire the mutex will be blocked until it is released.)

There is also a more convenient and compact way of using the Lock object, which avoids the explicit use of the acquire() and release() methods mentioned earlier. You will be introduced to this method, along with other synchronization primitives and their usage examples, in the advanced version of this course

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic




Read more >>

Threading, Multiprocessing and Asyncio in Python (Part 3)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

2. Threading
2.1 Threading. Creation and handling

The threading package is part of the Python standard library starting from version 1.5 (November 1997), so it does not require any additional installation. The minimum requirements for starting a thread are as follows:

  1. Import the threading package.
  2. Write the code that should be executed in the thread, typically in the form of a function.
  3. Create (instantiate) a thread, specifying the target function's name as a parameter. If the function requires arguments, they can be passed using the args parameter in tuple format.
  4. Start the thread using the start() method.

import time
from threading import Thread

def clock(delay):
   time.sleep(delay)
   print(f"Current time: {time.strftime('%X')}, delay {delay} sec.")


thread1 = Thread(target=clock, args=(2,))
thread2 = Thread(target=clock, args=(3,))


if __name__ == '__main__':
   start = time.time()
   print(f"Time start: {time.strftime('%X')}")
   thread1.start()
   thread2.start()
   print(f"Time end: {time.strftime('%X')}")
   print(f'======== Total time: {time.time() - start:0.2f} ========')

Result:

Time start: 07:39:58
Time end: 07:39:58
======== Total time: 0.00 ========
Current time: 07:40:00, delay 2 sec.
Current time: 07:40:01, delay 3 sec.


Process finished with exit code

As seen in the example, the main thread finished instantly (0 sec), while the additional threads finished after 2 and 3 seconds from the script's start time, respectively.

In our case, the additional threads continue their work without waiting for the main thread or the overall process to finish. This behavior is logical because the processor operates faster than external devices. Therefore, before terminating the script, it is necessary to wait for a signal indicating the proper completion of the external devices' operations. Otherwise, there is a risk of losing data that remains in the buffer of these devices. By default, if no specific actions are taken, the main process will wait for all its threads to complete before terminating.

In cases where it is necessary to forcefully terminate a thread simultaneously with the termination of the entire process, a special parameter called daemon is set to True. Please note that there are two ways to do this:

import time
from threading import Thread


def clock(delay):
   time.sleep(delay)
   print(f"Current time: {time.strftime('%X')}, delay {delay} sec.")


thread1 = Thread(target=clock, args=(2,))
thread2 = Thread(target=clock, args=(3,), daemon=True)


if __name__ == '__main__':
   start = time.time()
   print(f"Time start: {time.strftime('%X')}")
   thread1.start()
   thread2.daemon = True
   thread2.start()
   print(f"Time end: {time.strftime('%X')}")
   print(f'======== Total time: {time.time() - start:0.2f} ========')

As we can see, the daemon flag of the second thread was set to True twice. Of course, setting it once would have been enough - the second usage was only for demonstrating an alternative way of setting the flag. The result was that the second thread didn't have enough time to complete its task in one second, so it was terminated prematurely.

Time start: 07:54:41
Time end: 07:54:41
======== Total time: 0.00 ========
Current time: 07:54:43, delay 2 sec.


Process finished with exit code 0

(By the way, you can check if a thread is a daemon using the isDaemon() method. If it is a daemon, the method will return True.)

To find out when exactly the second thread stopped its execution, let's modify the clock function slightly. We will turn its body into a loop with a 1-second delay and print the result each time. In this case, the overall delay will be determined by the number of iterations in the loop.

def clock(delay: int):
   for d in range(1, delay + 1):
       time.sleep(1)
       print(f"Current time: {time.strftime('%X')}, {d}; delay {delay} sec.")

The result shows that the last data from the second thread ends with a 2-second delay. It didn't have time to print the result of its work in the last third second because it was terminated along with the main process in the 2nd second.

Time start: 17:20:42
Time end: 17:20:42
======== Total time: 0.00 ========
Current time: 17:20:43, 1; delay 2 sec.
Current time: 17:20:43, 1; delay 3 sec.
Current time: 17:20:44, 2; delay 2 sec.
Current time: 17:20:44, 2; delay 3 sec.


Process finished with exit code 0

If we also set daemon=True for the second thread, the overall process will terminate instantly, and both threads will leave absolutely no trace of their activity.

Time start: 17:29:27
Time end: 17:29:27
======== Total time: 0.00 ========


Process finished with exit code 0

Thus, we can conclude that the overall duration of the process is determined by the duration of the "longest" non-daemon thread. In the case of a daemon thread, it is determined by the duration of the main thread.

Combining the main and auxiliary threads

Very often, the results of auxiliary threads need to be used in the main thread. In such cases, we can use the join() method, which suspends the execution of the main thread until the joined thread is finished.

if __name__ == '__main__':
   start = time.time()
   print(f"Time start: {time.strftime('%X')}")
   thread1.start()
   thread1.join()
   thread2.start()
   print(f"Time end: {time.strftime('%X')}")
   print(f'======== Total time: {time.time() - start:0.2f} ========')

That's all we changed in the previous script - we simply joined the first thread to the main thread immediately after its start. This led to the following result:

Time start: 17:53:40
Current time: 17:53:41, 1; delay 2 sec.
Current time: 17:53:42, 2; delay 2 sec.
Time end: 17:53:42
======== Total time: 2.00 ========


Process finished with exit code 0

What is noteworthy here is that the duration of the main thread increased to 2 seconds, exactly the time it took for the first thread to complete. By waiting for the first thread to finish, the main thread started the second thread and immediately terminated, halting the entire process. As a result, the second thread also terminated without being able to do any work.

But if we first start both threads and then join the first thread to the main thread,

thread1.start()
thread2.start()
thread1.join()

then the picture will change:

Time start: 18:05:02
Current time: 18:05:03, 1; delay 2 sec.
Current time: 18:05:03, 1; delay 3 sec.
Current time: 18:05:04, 2; delay 2 sec.
Current time: 18:05:04, 2; delay 3 sec.
Time end: 18:05:04
======== Total time: 2.00 ========


Process finished with exit code 0

Key takeaway: First starting all the threads and then joining one of the threads to the main thread allows for the execution of all the started threads. The threads whose execution time is less than or equal to the execution time of the joined thread will complete their execution. All other threads will only work while the joined thread is active (thread2 in our case).

Therefore, if we join the longest-running thread to the main thread, it will enable all the threads to complete their execution. The overall process time in our case will be determined by the execution time of this thread (in our case - 3 seconds).

Time start: 18:14:17
Current time: 18:14:18, 1; delay 2 sec.
Current time: 18:14:18, 1; delay 3 sec.
Current time: 18:14:19, 2; delay 2 sec.
Current time: 18:14:19, 2; delay 3 sec.
Current time: 18:14:20, 3; delay 3 sec.
Time end: 18:14:20
======== Total time: 3.00 ========


Process finished with exit code 0

Now, a small task - try to answer the question yourself: What will happen to the overall execution time in this case: In this case? And in this case?

thread1.start()
thread2.start()
thread1.join()
thread2.join()

, in this case:

thread1.start()
thread2.start()
thread2.join()
thread1.join()

, and in this case?

thread1.start()
thread1.join()
thread2.start()
thread2.join()

And most importantly, try to explain why the overall execution time changed in this particular way and not otherwise.

Refactoring the example with two functions

The code example with two functions that we examined at the very beginning aimed to achieve maximum concurrency. We sent potentially blocking functions to their own threads, where they could no longer interfere with the main thread. This is certainly a huge advantage. However, there is a small drawback - the timing measurement decorator remained in the main thread (in the main() function), and now we don't know how much faster (or slower) the code became after adding threads. Let's join the additional threads to the main thread and see how the execution time has changed.

import time
from threading import Thread
from my_deco import time_counter

N = 5
DELAY = 0.5


def func1(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')


def func2(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
   thread1 = Thread(target=func1, args=(N,))
   thread2 = Thread(target=func2, args=(N,))
   thread1.start()
   thread2.start()
   thread1.join()
   thread2.join()
   print(f'All functions completed')


if __name__ == '__main__':
   main()
======== Script started ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . .


All functions completed
======== Script execution time: 2.51 ========

In summary, 2.5 seconds is twice as fast as regular synchronous code. This can be easily explained: now both functions are executed simultaneously, so the overall script execution time is equal to the time taken by each function.

Is there a way to further reduce this time? Most likely, yes. But first, it would be good to understand the reason for the delay.

The reason is that due to the loop inside the function, each subsequent iteration cannot start before the previous one finishes. In other words, each time, we have to wait for the DELAY seconds of delay from the previous iteration to start the next one.

The solution is obvious: we need to keep only one iteration in the function and move the loop to the main function. Then we can create a number of threads equal to the number of iterations of the two functions. And from previous examples, we already know that if we start all these threads simultaneously, the overall execution time will be equal to the time taken by the longest thread.

Well, as they say, done and done:

import time
from threading import Thread
from my_deco import time_counter

N = 5
DELAY = 0.5


def func1(i, n):
   time.sleep(DELAY)
   print(f'--- line #{i} from {n} is completed')


def func2(i, n):
   time.sleep(DELAY)
   print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
   threads = []
   threads1 = [Thread(target=func1, args=(i, N)) for i in range(N)]
   threads2 = [Thread(target=func2, args=(i, N)) for i in range(N)]
   threads.extend(threads1)
   threads.extend(threads2)

   for thread in threads:
       thread.start()
   for thread in threads:
       thread.join()

   print(f'All functions completed')


if __name__ == '__main__':
   main()

And the script running time has now been reduced to the running time of the longest thread:

======== Script started ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . .


All functions completed
======== Script execution time: 0.51 ========

Creating a Thread Using a Class

In the previous examples, functions were used to create and manage threads. However, the same can be achieved using classes.

In this case:

  1. The class that describes the thread should inherit from the Thread class.
  2. The behavior of the thread is described (or rather, overridden) using the run() method.
  3. All additional parameters, including the daemon flag, are passed using class attributes in the __init__() dunder method.

In the following example, two daemon threads are created. Thanks to the fact that the longest-running thread is joined to the main thread, both threads are able to finish:

import time
from threading import Thread


class ClockThread(Thread):
   def __init__(self, delay):
       super().__init__()
       self.delay = delay
       self.daemon = True


   def run(self):
       time.sleep(self.delay)
       print(f"Current time: {time.strftime('%X')}, delay {self.delay} sec.")


thread_1 = ClockThread(2)
thread_2 = ClockThread(3)


if __name__ == '__main__':
   start = time.time()
   print(f"Time start: {time.strftime('%X')}")
   thread_1.start()
   thread_2.start()
   thread_2.join()
   print(f"Time end: {time.strftime('%X')}")
   print(f'======== Total time: {time.time() - start:0.2f} ========')

The script, as expected, runs for 3 seconds:

Current time: 00:33:01, delay 3 sec.
Time end: 00:33:01
======== Total time: 3.00 ========

Process finished with exit code 0

As we can see, describing threads using classes didn't bring any significant changes. The management of thread objects is still done through the start() and join() methods, and the daemon parameter (attribute) can be specified directly in the __init__() method or later defined (overridden) in an instance of the user-defined class.

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic




Read more >>

Threading, Multiprocessing and Asyncio in Python (Part 2)

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

1. Introduction to Asynchrony
1.2 Comparison of three ways to get asynchronous code

The easiest way to understand the working principle of all three methods of achieving asynchronous code is through the following diagram:

All three methods listed (threads, multiprocessing, and the asyncio package) essentially accomplish the same thing: they allow the execution of the main program to run in parallel mode (represented by the green and blue lines). In other words, certain (often "problematic") sections of code start executing as if simultaneously and independently from each other. If one or even several branches of this parallel process take a long time or stop altogether, it won't affect the main program - it will continue working in its regular mode.

Please note that the term "as if" is used here intentionally - parallel computations are not always truly parallel. Sometimes, the switch between different branches of the process happens so quickly that they appear parallel to an external observer. It's similar to how 24 static frames displayed within one second create the illusion of continuous motion on a movie screen.

Indeed, this is the fundamental difference between the multiprocessing package, where computations truly occur in parallel, and the other two packages (threading and asyncio), where the effect of parallelism is achieved through fast switching between several "independent" parts of the program within a single process.

The technology of real parallel computation is called parallelism, while the technology of simulating parallel computations through fast switching is referred to as concurrency.

Parallel computations, in terms of resource usage, are not cheap because they involve multiple (or even all!) processor cores, additional RAM, and so on. Therefore, such a solution is justified only in the case of complex computational tasks where continuous and uninterrupted CPU processing is necessary (CPU-bound tasks).

In practice, however, we often deal with relatively slow processes such as database queries, network interactions, and so on. These are known as IO-bound tasks, in which the processor, while sending a request to a relatively slow external resource, is forced to idle and wait for a response.

In this case, it makes perfect sense to utilize the idle time of the processor for something more useful. For this purpose, two other technologies are used (threading and asyncio), where the waiting time for a "sleeping task" is used to perform other tasks.

It is important to note that machine resources are used more efficiently in this case - we no longer create new processes but instead use resources within the context of a single process.

Here, it is worth highlighting the fundamental technological difference between the threading and asyncio packages.

In the case of threads (using the threading package), the Python interpreter relies on the operating system (OS) for additional assistance. When creating a new thread, it essentially tells the OS, "Now I need the main thread's task to be executed simultaneously with the task of the new thread until this new thread finishes." In this case, the OS switches back and forth between the two tasks at strictly equal time intervals. The switching takes only fractions of a second, so from an external observer's perspective, both tasks appear to be executed in parallel and simultaneously.

The advantage of this method is evident: the Python code itself for each thread remains completely unchanged (referring to the function passed as the target parameter to the thread). The thread itself is simply an instance of the Thread class, and its control is managed using the start() and join() methods (from the language's syntax perspective, there is nothing fundamentally new!).

There are indeed some drawbacks to using threads:

  • Thread-related data needs to be stored, which requires additional memory resources.
  • The context switching between threads during data reading/writing also takes time. The more threads are involved, the more noticeable this becomes.
  • Thread management is handled by the operating system, not the Python interpreter. Therefore, thread switching occurs based on the OS's scheduling algorithm, which may not always be optimal in terms of prioritizing the execution of specific threads.

These factors can impact the overall performance and efficiency of threaded code.

All the aforementioned drawbacks are absent in the asyncio package. Here, only one thread is used within a single process, of course. Everything would be fine if it weren't for one significant drawback: applying this method requires its own separate and fundamentally new code, which differs significantly from the familiar syntax of the Python language.

However, judge for yourself - here, for example, is what the solution to the previous task would look like using the asyncio package:

import time
import asyncio
from my_deco import async_time_counter


N = 5
DELAY = 0.5




async def func1(n):
   for i in range(n):
       await asyncio.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')




async def func2(n):
   for i in range(n):
       await asyncio.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')




@async_time_counter
async def main():
   print(f'All functions completed')




async def run():
   task0 = asyncio.create_task(main())
   task1 = asyncio.create_task(func1(N))
   task2 = asyncio.create_task(func2(N))
   await task0
   await task1
   await task2


if __name__ == '__main__':
   asyncio.run(run())

The result of executing this script will be exactly the same as in the previous example with threads: control is immediately passed back to the main program main(). And since the code in main() only contains one print statement, this function completes almost instantly, and the result of the two other functions' actions becomes visible after the main() program has finished:

======== Script started ========
All functions completed
======== Script execution time: 0.00 ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . . . . . . . . .
Process finished with exit code 0

It is reasonable for beginners who have just learned the basics of Python to ask, "In which language is this code written?" In fact, there is nothing surprising about this question because:

  1. Function definitions use the new async keyword.
  2. Inside these functions, an unfamiliar operator await is used.
  3. Strictly speaking, the use of these two operators transforms the functions into a completely different entity in the Python language - they are no longer regular functions but coroutines.
  4. Instead of the familiar time delay using time.sleep(DELEY), its "asynchronous" counterpart asyncio.sleep(DELEY) is used, which not only introduces a delay but also includes a control element that switches execution from the current function (or should we say, current coroutine) to another.
  5. The previous decorator @time_counter also cannot work here because the coroutine main() cannot be simply invoked with parentheses like the function main(). This peculiarity needs to be taken into account when defining a new decorator, @async_time_counter.
  6. Finally, running this code using the regular approach is no longer possible - it requires a special construct like asyncio.run().

As a result, it turns out that this approach also has its own downsides, and quite significant ones at that.

Thus, a brief overview of the three methods (technologies) for creating asynchronous code has shown that none of the discussed options has universal advantages over the others. Each has its own merits and drawbacks. Therefore, all three methods have prospects for further development, improvement, and practical usage.

Hence, the answer to the question "Which option to choose?" is surprisingly straightforward: the one that best suits the specific requirements of your current task. Of course, it is crucial to have an equally good command of each of the listed technologies to be able to make the optimal choice at the right moment, rather than simply relying on familiarity.

It's natural to wonder, "Are there any other ways to make code asynchronous besides the three mentioned in the course title?"

The answer is, undoubtedly, yes.

The subprocess package allows for the creation of additional processes in which various programs can be executed, including Python code.

The concurrent.futures package provides a convenient interface for asynchronous tasks and parallel computing. It abstracts away the details of thread or process creation and management, making it more preferable in simple scenarios where ease of use is important and direct control over threads or processes is not required. However, for more complex scenarios or lower-level control, the threading and multiprocessing modules can provide greater flexibility.

In addition to the packages included in the Python standard library, there are other well-known packages that are not part of it. For example, the packages curio
( https://curio.readthedocs.io/ ) and trio
( https://trio.readthedocs.io/ ) are used for working with coroutines.

The examples mentioned above can be classified as universal packages capable of making almost any synchronous code asynchronous. In addition to these, there are also specialized packages that enable achieving asynchronicity for specific programs and applications. For instance, the select package is used to facilitate asynchronous socket operations (via the socket package).

Furthermore, within the socket package itself, there are separate "asynchronous" methods and functions that are part of the regular "synchronous" package.

Certainly, our course focuses on the three foundational "pillars" of asynchronicity mentioned in the title. They form the basis and foundation of this programming approach. However, the topic of asynchronicity in Python would not be fully explored without at least a brief overview of some additional packages and methods mentioned earlier. This will be the subject of the final, fifth lesson of this course.

So, let's proceed to a detailed study of the three main packages listed in the course title. We will begin this exploration with threads (the threading package).

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic




Read more >>

Threading, Multiprocessing and Asyncio in Python

Contents at a Glance

  1. Introduction to Asynchrony
    1. What is "asynchronous code"?
    2. Comparison of three ways to get asynchronous code
  2. Threading
    1. Threading. Creation and handling
    2. Threading. Synchronization primitives: Lock
    3. Threading. Synchronization primitives: Lock (continuation), Event, Condition, Semaphore, Queue 🔒
    4. Web Scraping by Threading 🔒
  3. Multiprocessing 🔒
    1. Multiprocessing. Creation and handling 🔒
    2. Multiprocessing: Synchronization primitives 🔒
  4. Asyncio Package
    1. Generator as an asynchronous function
    2. Coroutines, Tasks and Event Loop
    3. From Generators to Coroutines and Tasks 🔒
    4. Web scraping by aiohttp Package 🔒
    5. Handling files by aiofiles Package 🔒
    6. Asyncio Synchronization Primitives 🔒
  5. Additional Packages and Methods for Creating Asynchronous Code 🔒
    1. Subprocess Package 🔒
    2. Concurrent.futures Package 🔒
    3. Sockets - timeout() Method and select Package 🔒
    4. curio and trio Packages 🔒

1. Introduction to Asynchrony
1.1 What is "asynchronous code"?

First of all, let's try to clarify the terminology and understand what is behind the terms synchronous and asynchronous code. And also let's try to figure out what is so bad in synchronous code that everyone is so persistently trying to turn it into asynchronous?

Synchronous code is code in which all instructions are executed strictly sequentially, line by line, and executing the next line of code is possible only if the previous one is completely executed.

The main problem of synchronous code is the requirement to not execute the next instruction until the previous one is completed. This poses a significant challenge for programs that interact with the outside world or other programs during execution since the execution of an instruction may unexpectedly require much more time than usual.

To prevent this from happening, the software code should be mindful of what is happening around it. If the next instruction has the potential to slow down the execution of the main program, it should be parallelized with the execution of other faster instructions or postponed altogether until a more opportune time.

In other words, the task of asynchronous programming is to replace the "mindless" sequential execution of instructions (synchronous) with a "meaningful" change in the order of execution based on the completion time of different instructions (asynchronous).

It is important to emphasize here that it is not necessary to incorporate complex algorithms for estimating the execution time of subsequent instructions into asynchronous code. In the vast majority of cases, it is sufficient to parallelize the execution of problematic sections, move them into a background mode, so that they do not hinder the fast execution of the main program.

Now, (hopefully!) having gained some understanding of asynchrony, it's time to provide a strictly scientific definition to this phenomenon. Fortunately, there are numerous definitions available on the internet, each more obscure than the next 😉. Personally, I found the definition of asynchrony on the Mozilla.org developer website quite appealing:

“Asynchronous programming is a technique that enables your program to start a potentially long-running task and still be able to be responsive to other events while that task runs, rather than having to wait until that task has finished.”

https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Introducing

Thus, asynchrony is what prevents your program from getting "stuck" even when it reaches blocking (or "very long") sections of code, as these code sections will be executed concurrently (or almost concurrently) with the main program.

This is precisely why packages capable of transforming synchronous code into asynchronous code have gained incredible popularity.

Well, at this point, it's probably a good time to go through an example that allows for an even better understanding and consolidation of everything mentioned above.

Let's assume we have two functions (func1() and func2()) that need to be executed sequentially:

import time
N = 5
DELAY = 0.5


def func1(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')


def func2(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')


def main():
   func1(N)
   func2(N)


if __name__ == '__main__':
   start = time.time()
   main()
   print(f'Total time: {time.time() - start}')

The main() function will be the main control function here and onwards, while the functions func1() and func2() are called sequentially within it. Additionally, the total execution time of the main() function, which obviously equals the execution time of the entire script, will be calculated.

In this classic example of synchronous code, it is evident that the control flow will only be passed to the second function (func2()), after the completion of the first function (func1()). It's also fortunate that in our example, the values of the repetition count (N = 5) and the delay time (DELAY = 0.5 seconds) are relatively small, allowing the program to complete within a short time of 5 seconds. But what if these parameters have multiple zeroes at the end? In that case, the execution of func2() may not even be waited for, let alone the appearance of the final completion message for all functions.

It seems that without an asynchronous solution to this problem, someday, in a not-so-pleasant moment, we might find ourselves in a very difficult situation. Therefore, let's try applying one of the three techniques mentioned in the course title, such as threads.

A more detailed explanation of how this and the following examples work will be provided a little later. For now, let's simply enjoy the beauty and ease of transforming synchronous code into asynchronous.

But first, let's add a useful detail to our code, namely a decorator that calculates the runtime of our program. Since all our future tasks will be evaluated in terms of code execution time in one way or another, it makes sense to optimize this calculation from the very beginning. Moreover, it doesn't require knowledge of asynchronous programming methodologies. Our usual "synchronous" knowledge will be sufficient for this purpose.

Many of you probably remember from the basics of the Python language that it is more logical to extract repetitive code within a function and place it separately as a decorator. For those who are not familiar with this concept or may have forgotten, I recommend watching these two videos that cover all four variations of creating decorators:

  • Simplest function decorator for a function (Russian voice): https://youtu.be/394ZfiPJQ38
  • More advanced decorator variations (Russian voice: https://youtu.be/2szgmbn3cYM
So, after the entry point, the code will pass into a decorator placed in a new module within the working directory named my_deco.py:

def time_counter(func):
   @functools.wraps(func)
   def wrap():
       start = time.time()
       print("======== Script started ========")
       func()
       print(f"Time end: {time.strftime('%X')}")
       print(f'======== Script execution time: {time.time() - start:0.2f} ========')


   return wrap

And the previous script is supplemented by importing the new decorator and adding it to the main() function:

import time
from my_deco import time_counter

N = 5
DELAY = 0.5


def func1(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'--- line #{i} from {n} is completed')


def func2(n):
   for i in range(n):
       time.sleep(DELAY)
       print(f'=== line #{i} from {n} is completed')


@time_counter
def main():
   func1(N)
   func2(N)
   print(f'All functions completed')


if __name__ == '__main__':
   main()

Well, now we can add threads. To do that, we need to import them first:

from threading import Thread

And slightly modify the main() function:

@time_counter
def main():
   thread1 = Thread(target=func1, args=(N,))
   thread2 = Thread(target=func2, args=(N,))
   thread1.start()
   thread2.start()
   print(f'All functions completed')

Indeed, the code transformation is minimal, but the result is remarkable!

======== Script started ========
All functions completed
======== Script execution time: 0.01 ========
--- line #0 from 5 is completed
=== line #0 from 5 is completed
--- line #1 from 5 is completed
=== line #1 from 5 is completed
. . . . . . . . . . . . . . . .
Process finished with exit code 0

Please note that the line indicating the end of the program execution (======== Script execution time: 0.01 ========), as well as the message indicating the completion of all functions (All functions completed), appear before the information generated by the functions themselves. This confirms that the functions func1() and func2(), which had the potential to block the code, are no longer blocking. Threads allow us to easily "jump over" them and pass control to the code that follows. Consequently, our synchronous code has been transformed into asynchronous code, and its execution time has been reduced from 5 seconds (or even infinity!) to 0.01 seconds.

In conclusion, let's summarize a few observations that will be useful as we further explore threads:

  • The objects thread1 and thread2 represent two threads in which our functions are executed, passed as the target parameter. The corresponding arguments are also passed to these functions using the args parameter. Note that the arguments themselves are passed in tuple format.
  • Creating a thread is similar to defining a function with the def statement. For example, the statement def func means that the function func is only declared but not executed. To execute it, a separate line is required where the function name is invoked with parentheses: func().
  • Similar to function invocation, starting a thread requires a similar approach. Instead of simple parentheses, we use the start() method appended to the thread object.

These observations will be helpful as we delve deeper into the study of threads.

Indeed, threads provide one method to address the issue of code synchronicity. It is evident that such a powerful tool requires further and deeper exploration, which we will undertake in the subsequent lessons.

However, as suggested by the course title, there are at least two more mechanisms or methods to achieve asynchronicity. Does it make sense to divert our attention to learning something else when threads have already allowed us to achieve impressive results?

To find an answer to this question, let's explore the next topic (article).

You can learn more about all the details of this topic from this video (Russian Voice):



To the next topic




Read more >>

Debugging Code In The Apps Script Editor

This article will be useful to those who need to change or modify their Apps Script code (including the code posted on this site) to their immediate needs.

And the first question:

What To Do If The Modified Script Stops Working?

Of course, in order for the code to work again, it needs to be debugged.

And first of all, we need to find a place after which problems begin and the code behaves in a completely different way from what is expected of it.

There may be several strategies here. The most reliable of them (but not the fastest) is to check the entire code step by step, starting from the very beginning.

To do this, in the editor window, you must:

  • select the function to be debugged (in the figure, this is the tmp() function);
  • set breakpoint - in the figure it is a purple dot to the left of the 32nd line number;
  • and click the Debug button (in the header above the code block, between the Run button and the name of the debug function).

In this figure, the program stopped just at the breakpoint on the 32nd line.

In the same figure, you can see that in the right block under the heading Debugger there are four buttons:

  1. Resume button (triangle) - pressing this button will continue the program execution until the next breakpoint (or until the end of the program if there are no more breakpoints).
  2. Button Step Over (point with an arcuate arrow) - pressing ensures the execution of the current line and the transition to the next line.
  3. Button Step In (arrow pointing down to the point) - if the current line is normal, then the current line will be executed and the transition to the next one will be performed (as with the Step Over command) . But if the current line contains a function, the transition will be made inside this function, that is, to the first line of code inside this function. (In the example in the figure, this is line 36, which contains the function fillColumn. Pressing Step In on line 36 will “take” the debugger inside this function)
  4. Button Step Out (up arrow from the dot) - the current function will be completed and the next breakpoint will be the first line after the current function call line (in our example, this is line 37).

Below the control buttons of the Debugger block is the Variables block, which shows the current value of the variables at each program breakpoint.

In addition, we can print the desired values to the console using the command:

console.log(variable)

These data are displayed on the images in the lower part in the Execution log block.

How To Debug Function Code With Arguments?

If we want to test the operation of a function that has arguments (for example, the function fillColumn(row, col, mask)), then the best solution is to first create a helper function tmp(), where

  • First, the values of all arguments of the fillColumn function will be set - the variables row, col, and mask;
  • And then the fillColumn(row, col, mask).
  • function itself will be called

Then, the launch of this very auxiliary function tmp() for debugging, in fact, will be the launch of the fillColumn function itself with arguments.

In fact, in the figure above, the tmp() function plays a similar auxiliary role for the fillColumn(row, col, mask) function.

How To Debug The Code Inside The onEdit(e) Function?

The onEdit(e) system function is used to intercept data changes/entry on Google spreadsheet and does not respond to the standard breakpoint processing by the debugger. Therefore, we will not be able to debug it in the usual way.

There are at least two solutions:

  1. Create a helper function to first transfer all the logic of the onEdit(e) function into it and debug it there using predefined input data. And after debugging, transfer the finished and debugged code back.
  2. Debug onEdit(e) what is called “in place”, but use windows called by the Browser class as breakpoints:

Browser.msgBox(variable);

For the convenience of debugging the second scenario, you can add the values of any variables as arguments to variable.

Now, to run the second option, you need to make any changes to the Google spreadsheet. This will immediately run the onEdit(e) function. And if there are no errors before the message box launch line, then we will definitely see a window with the variable.

For more information on how to debug scripts in the Apps Script editor, you can get from this video (RU voice):

Read more >>

Website Availability Test

It is important for each site owner to be sure that at the moment his online resource is available and working properly. Well, if, suddenly, a problem occurs, then the site owner should find out about it before everyone else.

There are a huge number of paid and shareware services that are ready to provide round-the-clock monitoring of web resources. And, in case of their unavailability, immediately inform the interested party about it.

However, there is a very simple way to run this check yourself, in a convenient mode and completely free of charge.

The idea is simple: using Google Apps Script, we send a request to the specified url and parse the response code. If the response code is 200, it does nothing. Well, if not, we send an error message to our email.

The script that implements this task is below:

function locator() {
  let sites = ['https://it4each.com/', 
               ];

  let myEmail = YourEmail;
  let subject = "Site not working!!!";
  let errors = [];
  
  // request sending and processing loop
  for (const site of sites) {
    try {
      let response = UrlFetchApp.fetch(site);
      if (response.getResponseCode() != 200 ) errors.push(site);
    } catch (e) {
      let error_messege = e.name + ': for website ' + site + '\n';
      console.error(error_messege);
      errors.push(site)
    };
  };

  // send email
  if (errors.length > 0) {
    let message = "";
    for (let error of errors) {
      message += 'Website ' + error + " doesn't working!\n";
    };
    message += '\n' + 'Remaining Daily Quota: ' + MailApp.getRemainingDailyQuota();

    MailApp.sendEmail(myEmail, subject, message)
  };
}

The locator() function monitors the operation of sites. Previously, the following initial data must be passed to this function:

  • List of sites sites;
  • Email address where the error message should be sent myEmail;
  • E-mail subject subject.

Next comes the cycle of sending and processing requests. This is done using the standard fetch(url) method of the UrlFetchApp class.

If the resource is available in principle, but its response code is not 200, then the name of the problematic resource is added to the errors error list on the same line.

But if the resource is not available at all, then UrlFetchApp.fetch(site) will give an error that can cause the program to stop. To prevent this from happening, we will process a variant of such an error through try - catch(e). And adding the name of this site will happen this time in the catch block.

The result will be processed below, in the send email block.

If the list of errors is not empty, then message will be generated in the loop, where all non-working sites will be listed. Additionally, information will be added on how many similar email messages can still be created today in order not to exceed the quota: MailApp.getRemainingDailyQuota().

The script is ready. But in order to carry out full-fledged monitoring, you need to run this script regularly and around the clock. Therefore, we need to install a trigger.

You can learn how to create and configure a trigger, as well as get more information about how this script works, from this video (RU voice):

Read more >>

Drop Shipping Online Store on Django (Part 7)

7. Data Visualization Using Forms and Views

Attention! If you're having trouble following the previous step, you can visit relevant lesson , download the archive of the previous step, install it, and start this lesson exactly where the previous one ended!

In this course, we will go through all the stages of creating a new project together:

Add pages: shop.html, shop-details.html and cart.html

Our project is nearing completion, from which we are literally a few last steps away.

The most important functional part, which accounts for all the basic operations with the database, was completed in the 5th lesson. Those, in fact, we already have all the business logic and we just have to add a user interface for displaying database data and changing them.

To warm up, let's first do what we already know well - copy the last 3 pages from the template to the templates/shop folder that are still there: shop.html, shop-details.html and cart.html. And let's try to open them in the project.

To do this, first of all, we will remove from them what is already in the base template. Next, add links to these pages to the shop/urls.py configurator.

urlpatterns = [
    path('fill-database/', views.fill_database, name='fill_database'),
    path('', TemplateView.as_view(template_name='shop/shop.html'), name='shop'),
    path('cart_view/',  
         TemplateView.as_view(template_name='shop/cart.html'), name='cart_view'),
    path('detail/<int:pk>/',  
         TemplateView.as_view(template_name='shop/shop-details.html'), 
         name='shop_detail')
]

By the way, pay attention: not only links are shown here, but also the views themselves! Before us is an example of that rare case when the view can not be imported from the views.py module. The point is that template_name (the only required parameter for TemplateView) can be specified in the as_view() method as a kwargs argument. And then the view will “run” directly from the shop/urls.py configurator!

And the final touch - let's add links to the SHOP and CART pages to the main menu in order to immediately check if we made any mistakes in the layout when embedding the base template. There is nowhere to add a link to the detail page yet. Therefore, to check this page for errors, it will have to be called “manually”.

New Generic View Types: ListView and DetailView

We started our acquaintance with the generic view in the 4th lesson, and already then we were convinced how powerful and concise Django's tool is. And from the example above, we also learned that TemplateView can generally be written in one line directly in the shop/urls.py configurator.

Two other representatives of this category, namely: ListView and DetailView, have the same amazing properties. In the simplest version of the view, you can specify only the names of the model and template. And this will be enough for the html template to get all the necessary information and be able to display all the rows of the Products table (ListView), or all the fields of a single product whose pk will be listed at the end of the url (DetailView).

(To be fair, it's not even necessary to specify a template name - by default, this name is already included in the generic view settings. More information can be found here: Class-based views)

Creating ProductsListView

When going to the product page via the link /shop/, the user expects to see the entire list of products. Therefore, we change the already created test view TemplateView, which was created solely to test the template, with a new one - ProductsListView:

from django.views.generic import ListView

from shop.models import Product


class ProductsListView(ListView):
    model = Product
    template_name = 'shop/shop.html'

Do not forget to also make changes to the shop/urls.py configurator:

urlpatterns = [
    path('', views.ProductsListView.as_view(), name='shop'),
    path('cart_view/',
         TemplateView.as_view(template_name='shop/cart.html'), name='cart_view'),
    path('detail/<int:pk>/',
         TemplateView.as_view(template_name='shop/shop-details.html'),
         name='shop_detail'),
    path('fill-database/', views.fill_database, name='fill_database'),
]

By default, all data from the Product model will automatically be passed to the template as an object_list object. Therefore, by creating a cycle through object_list in the html template, we will get access to all product products, which means we can get the values of all fields of interest to us:

# one block
                {% for product in object_list %}
					<div class="col-12 col-lg-4 col-md-6 item">
                        <div class="card" style="width: 18rem;">
                            <form method="post" action="">
                                <img src="{{product.image_url}}" class="card-img-top" alt="...">
                                <div class="card-body">
                                    <h5 class="card-title"><b>{{ product.name}}</b></h5>
                                    <p class="card-text">
                                        {{ product.description }}
                                    </p>
                                </div>
                                <ul class="list-group list-group-flush">
                                    <li class="list-group-item">Price: {{ product.price }}</li>
                                    <li class="list-group-item">
										{% csrf_token %}
										<label class="form-label" for="id_quantity">Quantity:</label>
										<input type="number" name="quantity" value="1" min="1"
											   required id="id_quantity"/>
                                    </li>
                                </ul>
                                <div class="card-body">
                                    <button class="learn-more-btn" type="submit">buy now</button>
                                    <a class="contactus-bar-btn f_right" href="">detail</a>
                                    <br><br>
                                </div>
                            </form>
                        </div>
                    </div>
                {% endfor %}

The shop/shop.html template now contains 4 identical test blocks. Replacing one of them with the proposed option, we will get a complete list of all blocks with all the values of the product table.

The same concise and elegant solution exists in Django for displaying a single selected object. Only now it inherits not ListView, but DetailView:

class ProductsDetailView(DetailView):
    model = Product
    template_name = 'shop/shop-details.html'

And don't forget to change the name of the view in shop/urls.py:

urlpatterns = [
    path('', views.ProductsListView.as_view(), name='shop'),
    path('cart_view/',
         TemplateView.as_view(template_name='shop/cart.html'), name='cart_view'),
    path('detail/<int:pk>/', views.ProductsDetailView.as_view(),
         name='shop_detail'),
    path('fill-database/', views.fill_database, name='fill_database'),
]

Now that the actual product list is displayed, we can add a link to the detail page to the product loop in the shop/shop.html template:

<div class="card-body">
	<button class="learn-more-btn" type="submit">buy now</button>
	<a class="contactus-bar-btn f_right" href="{% url 'shop_detail' product.pk %}">detail</a>
	<br><br>
</div>

Pay attention to how a new compound link {% url 'shop_detail' product.pk %} is created: after a space, from the name url comes the product number in the database - product. pk. Of course, this result also needs to be verified.

Adding the selected item to the cart

So we have come to one of the most crucial moments - to filling the cart with the selected product. Of course, this logic can also be implemented using generic view. Moreover, this method is considered preferable, because, as you know, the more complex the task, the more understandable and concise the generic view code looks, compared to the code of a regular function.

But, this option may not be very clear for beginners. Therefore, to solve this problem, let's return to functions and forms again.

Let's start with the form. We have to fill in the OrderItem table, which is connected by ForeignKey to the Product and Order tables (models). Therefore, in fact, the only unknown field that we have to enter is the Quantity field, and we can take all other data from other tables. Therefore, our AddQuantityForm form will consist of only one field:

from django import forms

from shop.models import OrderItem


class AddQuantityForm(forms.ModelForm):
    class Meta:
        model = OrderItem
        fields = ['quantity']

Now back to the view, which we will call add_item_to_cart. And here our task is extremely simplified - we do not need to create a GET request. This already does the view ProductListView. Therefore, all that is required from the add_item_to_cart view is to receive and process the POST request:

@login_required(login_url=reverse_lazy('login'))
def add_item_to_cart(request, pk):
    if request.method == 'POST':
        quantity_form = AddQuantityForm(request.POST)
        if quantity_form.is_valid():
            quantity = quantity_form.cleaned_data['quantity']
            if quantity:
                cart = Order.get_cart(request.user)
                # product = Product.objects.get(pk=pk)
                product = get_object_or_404(Product, pk=pk)
                cart.orderitem_set.create(product=product,
                                          quantity=quantity,
                                          price=product.price)
                cart.save()
                return redirect('cart_view')
        else:
            pass
    return redirect('shop')

As you can see, if the form has passed validation, then the quantity object is created. We also remember that all OrderItem objects do not exist on their own, but are necessarily tied to some kind of cart or order. The get_cart method that we have already created in the previous lessons is able to provide us with the desired cart - the cart object. The product object, whose quantity we just confirmed, is easily obtained by request for pk=pk. By the way, product can be done using the get() method, but the get_object_or_404() variant is considered more reliable, which can handle a 404 error if the object with the desired pk will not be in the database.

Thus, we have already received all the fields necessary to create a new object. Therefore, now using cart.orderitem_set.create() we create a new model object OrderItem, and using the cart.save() method, we fix the connection of this object with an order basket.

The last thing left for us now is to add a new url to the configurator:

path('add-item-to-cart/<int:pk>', views.add_item_to_cart, name='add_item_to_cart'),

and then add this new url to the action attribute of the form tag on the shop/shop.html page:

<form method="post" action="{% url 'add_item_to_cart' product.pk %}">

Now you're ready to add orders to your shopping cart. True, we can only see the result in the admin panel.

And a very important touch, which almost remained behind the scenes. All users should be able to look at the product catalog. But only registered users can choose a product and add it to the cart. To solve the problem of protecting the add_item_to_cart view from unauthorized access by unauthorized users, the decorator @login_required will help. As you can see, this decorator will automatically redirect an unlogged user to the 'login' login page.

Cart management: display a list of items

Конечно же, добавленные в корзину позиции хотелось бы видеть не только в админке. Тем более, что у нас уже всё для этого готово. Кроме вью. Им и займёмся.

По сути, для отображения элементов корзины необходимо получить данные двух моделей:

  • Модели заказа Order (из которой с помощью метода get_cart(user) получаем объекта cart)
  • И модели OrderItem (для order=cart)

И затем передать эти данные в шаблон с помощью словаря context:

From the cart object in the template, only data related to the cart itself will be retrieved (in our case, only the order amount). We will get the data for each position of the item order as a result of the loop over the items object:

{% for item in items %}
    <div class="row">
        <div class="col-12 col-md-1 item">
            &nbsp;&nbsp;&nbsp;{{ forloop.counter }}
        </div>
        <div class="col-12 col-md-4 item">
            {{ item.product }}
        </div>
        <div class="col-12 col-md-2 item">
            {{ item.quantity }}
        </div>
        <div class="col-12 col-md-2 item">
            {{ item.price }}
        </div>
        <div class="col-12 col-md-2 item">
            {{ item.amount }}
        </div>
        <div class="col-12 col-md-1 item">
        </div>
    </div>
{% endfor %}

Cart Management: Deleting Items

Ошибаться может каждый. Поэтому пользователь должен иметь возможность удалять лишние позиции из корзины.

Как мы уже хорошо усвоили - все изменения базы данных должны проходить только через форму и метод POST. И удаление позиции в том числе.

Здесь стоит отметить, что для удаления элементов модели в Django имеется очень удобное generic view - DeleteView, для которого не нужно ни создавать отдельную форму в модуле shop/forms.py, ни специально описывать метод POST. Всё это уже создано в DeleteView по умолчанию:

@method_decorator(login_required, name='dispatch')
class CartDeleteItem(DeleteView):
    model = OrderItem
    template_name = 'shop/cart.html'
    success_url = reverse_lazy('cart_view')

    # Проверка доступа
    def get_queryset(self):
        qs = super().get_queryset()
        qs.filter(order__user=self.request.user)
        return qs

The only thing we have added here (changed, to be more precise) is the get_queryset method, which filters the OrderItem model data request by user.

We also do not need any additional output using the GET method: as in the case of adding a position to the cart, we will use the ready-made data that cart_view kindly provides us.

All that remains for us is to add a form to EVERY (!!!) position of the cart (fortunately, they are all displayed in a loop anyway) and a new url to call a new view CartDeleteItem.

Changes in the shop/cart.html template:

<div class="col-12 col-md-1 item">
    <form method="post" action="{% url 'cart_delete_item' item.pk %}">
        {% csrf_token %}
        <button type="submit" style="color: blue"><u>delete</u></button>
    </form>
</div>
Добавление в shop/urls.pyl:
path('delete_item/<int:pk>', views.CartDeleteItem.as_view(), name='cart_delete_item'),

Cart management: proceed to create an order

After everything you need has been successfully added to the cart, and everything superfluous has been safely removed from it, all that remains for us is to complete the recruitment process, change the cart status from STATUS_CART to STATUS_WAITING_FOR_PAYMENT and thus proceed to pay for the order.

The make_order method itself has long been created by us. It remains only to add a button to the basket sheet, upon pressing which this method will be launched.

The task for the view will be very simple - find the desired basket and apply the make_order method to it:

@login_required(login_url=reverse_lazy('login'))
def make_order(request):
    cart = Order.get_cart(request.user)
    cart.make_order()
    return redirect('shop')

After changing the status, there will be a redirect to the shop/shop.html page. After connecting online payment, this redirect can be replaced by a transition to the payment aggregator page. And you will also need to remember to add a new link to shop/urls.py:

path('make-order/', views.make_order, name='make_order'),

And add this link to the button on the cart page:

<a class="contactus-bar-btn f_right" href="{% url 'make_order' %}">
    Process to Payment
</a>

Conclusion

Our project has been completed. Of course, a lot was left behind the scenes: confirmation of registration by email, logging, connecting online payment, deployment, etc. etc.

However, the main functionality of the online store has been created. And improvement, as you know, never ends or ends.

In any case, if you have any questions, you know who to ask: it4each.com@gmail.com.

Good luck in creating your own online store and see you on new courses!

You can learn more about all the details of this stage from this video (RU voice):





Read more >>

Tags list