Summary: in this tutorial, you’ll learn how to use the Python threading module to develop a multithreaded program.
Extending the Thread class
We’ll develop a multithreaded program that scraps the stock prices from the Yahoo Finance website.
To do that, we’ll use two third-party packages:
requests
– to get the contents of a webpage.lxml
– to select a specific element of an HTML document.
First, install the requests
and lxml
modules using the pip command:
pip install request lxml
Code language: Python (python)
Next, define a new class called Stock
that inherits from the Thread
class of the threading
module. We’ll place the Stock class in stock.py
module:
import threading
class Stock(threading.Thread):
pass
Code language: Python (python)
Then, implement the
method that accepts a symbol and initializes the __init__()
url
instance variable based on the symbol:
import threading
import requests
from lxml import html
class Stock(threading.Thread):
def __init__(self, symbol: str) -> None:
super().__init__()
self.symbol = symbol
self.url = f'https://finance.yahoo.com/quote/{symbol}'
self.price = None
Code language: Python (python)
For example, if you pass the symbol GOOG
to the __init__()
method, the URL will be:
https://finance.yahoo.com/quote/GOOG
Code language: Python (python)
After that, override the run()
method of the Thread
class. The run()
method gets the contents from the self.url
and grabs the stock price:
class Stock(threading.Thread):
def __init__(self, symbol: str) -> None:
super().__init__()
self.symbol = symbol
self.url = f'https://finance.yahoo.com/quote/{symbol}'
self.price = None
def run(self):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
response = requests.get(self.url, headers=headers)
if response.status_code == 200:
# parse the HTML
tree = html.fromstring(response.text)
# get the price in text
price_text = tree.xpath(
'//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()')
if price_text:
try:
self.price = float(price_text[0].replace(',', ''))
except ValueError:
self.price = None
def __str__(self):
return f'{self.symbol}\t{self.price}'
Code language: Python (python)
How it works.
Make a request to the URL using the requests.get()
method:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
response = requests.get(self.url, headers=headers)
Code language: Python (python)
Notice that without valid headers, Yahoo will return 404 instead of 200.
If the request is successful, the HTTP status code is 200. In this case, we get the HTML contents from the response and pass it to the fromstring()
function of the html
module from the lxml
package:
if response.status_code == 200:
tree = html.fromstring(response.text)
Code language: Python (python)
Every element on a webpage can be selected using something called XPath.
To get the XPath of an element using Google Chrome, you inspect the page, right-click the element, select copy, and Copy XPath.
The XPath of the stock price at the time of writing this tutorial is as follows:
//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]
Code language: Python (python)
To get the text of the element, you append the text()
at the end of the XPath:
//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()
Code language: Python (python)
Notice that if Yahoo changes the page structure, you need to change the XPath accordingly. Otherwise, the program won’t work as expected:
price_text = tree.xpath('//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()')
Code language: Python (python)
Once getting the price as text, we remove the comma and convert it to a number:
if price_text:
try:
self.price = float(price_text[0].replace(',', ''))
except ValueError:
self.price = None
Code language: Python (python)
Finally, add the
method that returns the string representation of the Stock object:__str__()
import threading
import requests
from lxml import html
class Stock(threading.Thread):
def __init__(self, symbol: str) -> None:
super().__init__()
self.symbol = symbol
self.url = f'https://finance.yahoo.com/quote/{symbol}'
self.price = None
def run(self):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
response = requests.get(self.url, headers=headers)
if response.status_code == 200:
# parse the HTML
tree = html.fromstring(response.text)
# get the price in text
price_text = tree.xpath(
'//*[@id="quote-header-info"]/div[3]/div[1]/div[1]/fin-streamer[1]/text()')
if price_text:
try:
self.price = float(price_text[0].replace(',', ''))
except ValueError:
self.price = None
def __str__(self):
return f'{self.symbol}\t{self.price}'
Code language: Python (python)
Using the Stock class
The following main.py
module uses the Stock
class from the stock.py
module:
from stock import Stock
symbols = ['MSFT', 'GOOGL', 'AAPL', 'META']
threads = []
for symbol in symbols:
t = Stock(symbol)
threads.append(t)
t.start()
for t in threads:
t.join()
print(t)
Code language: Python (python)
Output:
MSFT 253.67
GOOGL 2280.41
AAPL 145.86
META 163.27
Code language: Python (python)
How it works.
First, import the Stock
class from the stock.py
module:
from stock import Stock
Code language: Python (python)
Second, initialize a list of symbols:
symbols = ['MSFT', 'GOOGL', 'AAPL', 'META']
Code language: Python (python)
Third, create a thread for each symbol, start it, and append the thread to the threads list:
threads = []
for symbol in symbols:
t = Stock(symbol)
threads.append(t)
t.start()
Code language: Python (python)
Finally, wait for all the threads in the threads list to complete and print out the stock price:
for t in threads:
t.join()
print(t)
Code language: Python (python)
Summary
- Define a class that inherits from the
threading.Thread
class and override therun()
method.