Skip to content

Filtering Metallica Albums using CoT Prompting

  • #Projects
Read time: 28 minutes

Introduction

In this project, I explore the use of Chain of Thought (CoT) techniques to solve a real-world problem—filtering Metallica albums by release year. Chain of Thought is a powerful prompt engineering method that instructs a Language Model to break down tasks into sequential steps, improving its reasoning capabilities. By integrating CoT with data from Spotify, I aim to guide the model to accurately retrieve and display albums based on a specific timeframe. This project serves as an introduction to how LLMs, with the right instructions, can assist in automating and enhancing complex data retrieval tasks.

Let's start!

In the next block of code, I am importing necessary libraries to handle tasks such as loading documents, splitting text, and working with vector databases. I also set up environment variables by loading API keys from a .env file to enable interaction with OpenAI and RapidAPI. Finally, I ensure that these API keys are present and raise an error if they are missing.

from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.schema import Document
from dotenv import load_dotenv
import requests
import os

CHROMA_PATH = 'chroma_music'

# Load environment variables from .env file
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
RAPIDAPI_KEY = os.getenv('RAPIDAPI_KEY')

# Ensure environment variables are set
if not OPENAI_API_KEY or not RAPIDAPI_KEY:
    raise ValueError("API keys are missing from environment variables.") 

In the next block of code, I send a request to the Spotify API through RapidAPI to search for Metallica albums. I define the search parameters, such as querying for "Metallica" albums with a limit of 50 results. Using the API key from earlier, I make a GET request to the Spotify API and check if the response is successful. If the request is successful, I extract the list of albums from the JSON response and print how many albums were fetched. If the request fails, an error is raised with the corresponding status code.

url = "https://spotify23.p.rapidapi.com/search/"

querystring = {"q":"metallica", "type": "album", "limit": "50"}

headers = {
	"x-rapidapi-key": RAPIDAPI_KEY,
	"x-rapidapi-host": "spotify23.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)
if response.status_code != 200:
    raise Exception(f"API request failed with status code {response.status_code}")

# Extract albums from the JSON response
try:
    results = response.json().get('albums', {}).get('items', [])
    print(f"Fetched {len(results)} albums")
except ValueError as e:
    raise Exception("Error parsing JSON response") from e

Here is the output:

Fetched 50 albums

In the next block of code, I process the fetched album results by extracting and checking each album’s release year and title. I ensure each album is unique by using a set to track titles, then store valid albums in a list with their artist name, title, and release year. This way, I build a list of unique Metallica albums from the data.

all_albums = []
titles = set()  # unique album titles

for result in results:
    release_year = result['data'].get('date', {}).get('year', '')
    album_title = result['data'].get('name', '').lower()  # Convert title to lowercase
    album = {
            'artist': 'Metallica',
            'album': album_title,
            'year': release_year
        }
    if album not in all_albums:
        all_albums.append(album)

Let's create Document objects from the album data, converting each album's information into a document with its content and metadata. I then use a RecursiveCharacterTextSplitter to divide these documents into smaller chunks for easier processing. Finally, I print the number of original documents and the resulting chunks to see how the data has been split.

# Create documents from the album data
documents = [Document(page_content=str(album), metadata=album) for album in all_albums]

# Split the documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)

print(f'Split {len(all_albums)} documents into {len(chunks)} chunks.')

Output is:

Split 41 documents into 41 chunks.

Let's inspect a specific chunk’s details.

document = chunks[15]
print(document.page_content)
print(document.metadata)

Here is the output:

{'artist': 'Metallica', 'album': 'the metallica blacklist', 'year': 2021}
{'artist': 'Metallica', 'album': 'the metallica blacklist', 'year': 2021, 'start_index': 0}

Let's create a new database using the Chroma vector store. I use the OpenAIEmbeddings function to generate embeddings for the split document chunks and save these embeddings to the database. The database is stored in the directory specified by CHROMA_PATH.

# Create a new DB from the documents.
embedding_function = OpenAIEmbeddings()
db = Chroma.from_documents(chunks, embedding_function, persist_directory=CHROMA_PATH)
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

Here is the output:

Saved 41 chunks to chroma_music.

It is time to prepare the Chroma database for querying by loading it from the specified directory and using the same embedding function. I set a filter year (2020) and define a query to find Metallica albums released after this year, focusing on unique results. I then perform a similarity search on the database with a higher result limit (k=20) to retrieve more relevant albums based on the query.

# Prepare the DB.
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)
year_filter = 2020
query_text = f'List Metallica albums released after {year_filter}, focusing on unique and different albums.'

# Perform the query 
results = db.similarity_search_with_relevance_scores(query_text, k=20)  # Increased k to fetch more results

Checking if the similarity search returned no results or if the relevance score of the top result is below 0.7:

if len(results) == 0 or results[0][1] < 0.7:
    print(f'Unable to find matching results.') 

Firstly let's count how many alums were released after 2020 from the raw data. This gives me a concrete baseline to compare with the results generated by the Language Model (LLM). I can verify the accuracy of the model’s results and ensure it aligns with the expected number of albums.

# Function to filter albums by year
def filter_albums_by_year(all_albums, year_filter):
    filtered_documents = [album for album in all_albums if int(album['year']) > year_filter]
    
    if not filtered_documents:
        print("No albums found after the specified year.")
    else:
        filtered_documents_total = len(filtered_documents)
        print(f'It was found {filtered_documents_total} albums:')
        for album in filtered_documents:
            print(f" - {album['year']}: {album['album']}.")

year_filter = 2020        
filter_albums_by_year(all_albums, year_filter)

Here is the output:

It was found 7 albums:
 - 2023: 72 seasons.
 - 2021: the metallica blacklist.
 - 2023: metallica.
 - 2022: metallica.
 - 2024: metallica.
 - 2022: lux æterna.
 - 2021: metallica.

Now it is the most interesting part: defining a prompt template and Chain of Thought (CoT) steps for querying and processing Metallica albums. Let's start from an easy chain if thoughts.

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---


Answer the question based on the above context: {question}.

"""

# Chain of Thought (CoT) steps
cot_steps = [
        'First, find all "albums" where the year is greater than year_filter. For instance, if year_filter = 1999, list albums where the year is greater than 1999.',
    ]

Let's combine the prompt template with Chain of Thought (CoT) steps to create a detailed prompt for the model. I prepare the context by joining the content from the similarity search results and format the prompt with this context and the query. Then, I use the ChatOpenAI model to process the prompt and print the generated response.

# Combine prompt and CoT steps
prompt_with_cot = PROMPT_TEMPLATE + "\n" + "\n".join(cot_steps)
context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
prompt_template = ChatPromptTemplate.from_template(prompt_with_cot)
prompt = prompt_template.format(context=context_text, question=query_text)

model = ChatOpenAI()
response_text = model.invoke(prompt)
print(f"Response: {response_text}")

Here is the output:

Response: content='The Metallica Blacklist (2021)' response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 584, 'total_tokens': 593}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-d1bb8094-5f03-457c-b8f2-8bdb37659cdd-0' usage_metadata={'input_tokens': 584, 'output_tokens': 9, 'total_tokens': 593}

The output is incomplete and doesn't meet the expectation of listing 7 albums. The response stops after mentioning just one album, suggesting a premature completion of the task. The instructions may not have been clear enough for the model to correctly interpret the task. For instance, the "year_filter" logic may have been misinterpreted or ignored by the model. Let's modify CoT.

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---


Answer the question based on the above context: {question}.

"""

# Chain of Thought (CoT) steps
cot_steps = [
    'First, we need to find all "albums" where the integer value of "year" is greater than year_filter. For instance, if year_filter = 1999, we need to choose all albums where the "year" is greater than year_filter. For example, for "album" == "Death Magnetic": "year" = 2008 and 2008 > year_filter. That is why it is a suitable album for us and we need to print its year and title. Next, for "album" == "The Metallica Blacklist": "year" = 2021 and 2021 > year_filter. It is also suitable information. We need to find all albums where "year" > year_filter and print those.',
    'For instance, for year_filter = 1999, we have such result: "It was found 15 albums: 2023: 72 Seasons, 2008: Death Magnetic, 2021: The Metallica Blacklist, 2013: Metallica Through The Never (Music from the Motion Picture), 2016: Hardwired…To Self-Destruct, 2023: metallica, 2023: Metallica, 2003: St. Anger, 2016: Hardwired…To Self-Destruct (Deluxe), 2020: S&M2, 2022: Lux Æterna, 2024: METALLICA, 2000: I Disappear, 2022: Metallica (Remix), 2020: Metallica Remixes."'  
]

# Combine prompt and CoT steps
prompt_with_cot = PROMPT_TEMPLATE + "\n" + "\n".join(cot_steps)
context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
prompt_template = ChatPromptTemplate.from_template(prompt_with_cot)
prompt = prompt_template.format(context=context_text, question=query_text)

model = ChatOpenAI()
response_text = model.invoke(prompt)
print(f"Response: {response_text}")

Here is the output:

Response: content='Based on the context provided, the unique and different Metallica albums released after 2020 are:\n\n1. 2021: The Metallica Blacklist\n2. 2023: 72 Seasons\n3. 2023: metallica\n4. 2023: Metallica\n5. 2023: METALLICA\n6. 2022: Lux Æterna\n7. 2024: METALLICA\n8. 2022: Metallica (Remix)' response_metadata={'token_usage': {'completion_tokens': 104, 'prompt_tokens': 866, 'total_tokens': 970}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-26c0f9e2-d120-4c8c-bd89-1d2b4e8cf482-0' usage_metadata={'input_tokens': 866, 'output_tokens': 104, 'total_tokens': 970}

The model's output, based on the Chain of Thought (CoT) instructions modified, does indeed identify a set of Metallica albums released after 2020. However, there are some mistakes in its results when compared to the expected outcome. The model output contains some albums with different capitalizations or variations in title formatting (e.g., "metallica," "Metallica," "METALLICA"). However, these are essentially the same albums, which should be treated as duplicates. Therefore the model’s results are close to the desired output but exhibit issues.

Conclusion

In this project, I became familiar with Chain of Thought (CoT) prompting and saw firsthand how critical well-written prompts are to the model's success. By refining the steps and improving the clarity of the instructions, I was able to guide the model towards better results. However, I also identified some challenges, such as handling duplicate album titles and ensuring the model interprets the task correctly. This exercise highlighted the importance of prompt design in influencing the accuracy and effectiveness of LLM responses for real-world tasks.

© 2025

Elena Medvedeva. Created by Elena Aseeva. Some assets are created by freepik.com