Introduction
In this tutorial, we will write a simple Python program that downloads random images from a resource called https://picsum.photos/.
This resource allows downloading images with specified sizes. In this script, we will get images with sizes 370x250 px. So our URL will be https://picsum.photos/370/250
Using requests
package
Our first script will use Python's the most popular package for HTTP requests: requests
package. It is not in Python's standard library, so first, we will need to install it.
Activate your virtual environment and enter pip install requests
. It will install the latest version of the package.
Let’s take a look at the full program:
import os
import requests
from uuid import uuid4
import time
from urllib import parse
url = 'https://picsum.photos/370/250'
def save_post_image():
start = time.time()
for _ in range(25):
response = requests.get(url)
extension = os.path.splitext(parse.urlsplit(response.url).path)[-1]
image_name = f'{uuid4()}{extension}'
path = f'images/{image_name}'
with open(path, mode='wb') as f:
f.write(response.content)
elapsed = time.time() - start
print(f'{elapsed} s')
if __name__ == '__main__':
save_post_image()
As you can see the preceding script will download 25 images with defined sizes and saves them on the local file system (./images
) keeping the downloaded image's extension. To extract the file extension we use packages os
, urllib
. Also, unique names will be given to images using function uuid4()
.
When we send a request to the URL above it will redirect us to another URL like https://i.picsum.photos/id/372/370/250.jpg?hmac=<some_hmac>
. To get this URL we use the requests.models.Response
object's .url
attribute.
Also, we track the time to measure how much time it takes to download and save 25 images from the resource.
On my laptop, it took approximately 33.983 seconds.
Using asyncio and aiohttp
Next, we will rewrite our program to use aiohttp
library. It's blazingly fast asynchronous HTTP Client/Server for asyncio and Python. It allows us to send multiple requests asynchronously to the image resource's server (https://picsum.photos
). So our program will be executed in less time than its synchronous version because while we are waiting for a response from the server we can send another concurrent request which starts downloading the second image and so on.
First, we will need to install the required packages:
pip install aiohttp aiofiles
Also, we need aiofiles
package to save images on the local filesystem. aiofiles
is a library for handling local disk files in asyncio applications. It allows to read and save files in a non-blocking manner.
The asynchronous version of our script will look like this in the simplest version:
import asyncio
from aiohttp import ClientSession
import aiofiles
from uuid import uuid4
import time
url = 'https://picsum.photos/370/250'
async def make_request(session):
try:
resp = await session.request(method="GET", url=url)
except Exception as ex:
print(ex)
return
if resp.status == 200:
image_name = f'{uuid4()}.jpg'
path = f'async_images/{image_name}'
async with aiofiles.open(path, 'wb') as f:
await f.write(await resp.read())
async def bulk_request():
"""Make requests concurrently"""
async with ClientSession() as session:
tasks = []
for _ in range(25):
tasks.append(
make_request(session)
)
await asyncio.gather(*tasks)
def download_images():
start = time.time()
asyncio.run(bulk_request())
print('{} s'.format(time.time() - start))
if __name__ == '__main__':
download_images()
The coroutine bulk_request()
serves as the main entry point into the script’s chain of coroutines. It uses a single ClientSession
and a task is created for each image that is being downloaded. The requests are made using a single session to reuse the session’s internal connection pool.
The coroutine make_request()
makes the GET request, awaits the response, and saves the downloaded image if the status of the response is 200.
It took 3.149 seconds to download 25 random images for the asynchronous version of the script. It's 11 times faster than the program where requests
package is used.
Resources
- requests library
- aiohttp library
- asyncio module
- aiofiles library