
Building SQL Queries with Gemini AI
Utilizing Gemini AI & Prompt Design
to Construct SQL Queries
2024
Background and Objectives:
The adaptation of AI in workplace has drastically change our way of working. Here this portfolio aims to demonstrate my proficiency in leveraging Gemini AI from Google to construct effective SQL queries through prompt design. I posses only basic SQL knowledge and is challenged with a data analytics task that are beyond my level. This portfolio highlights my ability to:
-
Craft clear and concise prompts to query desired information and apply into my tasks.
-
Refine prompts iteratively to achieve the desired analytical outcome based on every responses.
-
Understand and verify the results of SQL queries from Gemini AI.
Data Used :
Mock datasets of a fictional online retail store. Data variables include user IDs, event dates, traffice source, traffic medium, transaction value, event timestamp, event name (actions like purchase, view cart etc.)​​
​
​
​
Challenge 1:
Construct a Order Attribution Report with 3 days Lookback Window​​
​
Task Objective:
Obtain a report that details the user journey leading to a purchase event within a specific timeframe (lookback window of three days). This report should include user IDs, event dates, traffic sources, and the total purchase value and event names other than purchase under the lookback windown period.
​
Initial Prompt Design:
"I am building an order attribution report with a 3 day lookback window. Every user_id has their event_name recorded (event name include page view, purchase, download and others) recorded with their own timestamp. I want to query their rows of each of different users purchase timestamp with a lookback window of three days. How can I query on different purchase dates of 3 days window from each user?"
​
First Response from Gemini:
​SELECT
user_pseudo_id,
event_date,
traffic_source_source,
traffic_source_medium,
event_value_in_usd
FROM
sample_events_date
WHERE
event_name = 'purchase'
AND event_timestamp >= MAX(event_timestamp) - INTERVAL 3 DAY;
Result Verification:
While this initial query captures purchases, it doesn't show the entire user journey. Rephrasing and refining the prompts to include all relevant events within the lookback window:
Prompt Tuning and Refinement:
"Instead of just showing purchases, can you show all user events (including page views, add-to-carts, etc.) within the last 3 days for users who made a purchase during that timeframe? I want to see the entire user journey leading to the purchase."
Follow-up Response from Gemini:
# Gemini would likely provide a revised SQL code that incorporates filtering based on the purchase
event within the lookback window and includes other relevant events.
​
SELECT
user_pseudo_id,
event_date,
traffic_source_source,
traffic_source_medium,
event_value_in_usd
FROM
sample_events_date
WHERE
user_pseudo_id IN (
SELECT user_pseudo_id
FROM sample_events_date
WHERE event_name = 'purchase'
AND event_timestamp >= MAX(event_timestamp) - INTERVAL 3 DAY
)
AND event_date >= DATE_SUB(
(SELECT MAX(event_date)
FROM sample_events_date
WHERE user_pseudo_id = sample_events_date.user_pseudo_id),
INTERVAL 7 DAY
)
AND event_name IN ('first_visit', 'session_start', 'page_view', 'view_item', 'add_to_cart', 'begin_checkout', 'purchase')
ORDER BY
user_pseudo_id, event_date;
Query Result in Excel Table Form:

Other Used Prompts for Contexual Understanding and Result Verification -
There has been a lot of unintended result provided by Gemini AI which may due to poor prompt structuring. To verify and gauge whether the query result provided by Gemini is accurate and correct, few other prompts are used for verification and confirmation:
​
-
Does this code include query of the event_name (page views, download) 3 days before the purchase_date?
-
Why the CURDATE() is used on your SQL code. Every event name has their own timestamp and does not seem like it is relevant to current date.
-
Tell me about order attribution report and specify its definition.
-
Tell me more about this SQL code AND event_timestamp >=MAX(event_timestamp) -INTERVAL3DAYFROM your_table AS p2 WHERE p2.user_id = u.user_id
-
Is 1685436253234280 timestamp data? If it is, please convert it into human-readable format in SQL.
-
Please provide me a breakdown of this SQL code and its functions.
Challenge 2:
Closed Funnel Analysis and Conversion Rate (Cont')
​
Task Objective:
Analyze user journeys within the purchase funnel, calculating the conversion rate at each stage (view product, add to cart, checkout, purchase) and the drop-off rate between stages.
Initial Prompt Design:
While the previous sections explored order attribution report, this section delves deeper by calculating drop-off rates. Initial prompt: "Using the same data for order attribution report, compose a SQL query that delivers a closed funnel analysis, broken down by event_date."
First Response from Gemini:
# Gemini might suggest a new SQL code structure utilizing a Common Table Expression (CTE) and
user-defined logic to assign funnel step values and calculate the total number of users and conversions
at each stage.
​
WITH funnel_stages AS (
SELECT
user_pseudo_id,
event_date,
event_name,
CASE
WHEN event_name = 'first_visit' THEN 1
WHEN event_name = 'session_start' THEN 1 -- Assuming session_start also represents funnel entry
WHEN event_name = 'page_view' THEN 1 -- Assuming page_view also represents funnel entry
WHEN event_name = 'view_item' THEN 2
WHEN event_name = 'add_to_cart' THEN 3
WHEN event_name = 'begin_checkout' THEN 4
WHEN event_name = 'purchase' THEN 5
ELSE 0
END AS funnel_step
FROM sample_events_date
)
SELECT
fs.event_date,
fs.funnel_step,
COUNT(DISTINCT fs.user_pseudo_id) AS users_at_stage,
SUM(CASE WHEN fs.funnel_step = 5 THEN 1 ELSE 0 END) AS total_conversions
FROM funnel_stages fs
GROUP BY fs.event_date, fs.funnel_step
ORDER BY fs.event_date, fs.funnel_step;
Result Verification:
While this initial query captures the funnel steps, breaking down by event date and steps. My initial prompt lacks mention on the drop off rate on each funnel step.
Prompt Tuning and Refinement:
"Add one new column that stated the percentage drop rate on each funnel step."
Follow-up Response from Gemini:
​
WITH funnel_stages AS (
SELECT
user_pseudo_id,
event_date,
event_name,
CASE
WHEN event_name = 'first_visit' THEN 1
WHEN event_name = 'session_start' THEN 1
WHEN event_name = 'page_view' THEN 1
WHEN event_name = 'view_item' THEN 2
WHEN event_name = 'add_to_cart' THEN 3
WHEN event_name = 'begin_checkout' THEN 4
WHEN event_name = 'purchase' THEN 5
ELSE 0
END AS funnel_step
FROM sample_events_date
)
SELECT
fs.event_date,
fs.funnel_step,
COUNT(DISTINCT fs.user_pseudo_id) AS users_at_stage,
SUM(CASE WHEN fs.funnel_step = 5 THEN 1 ELSE 0 END) AS total_conversions,
CASE
WHEN fs.funnel_step = 1 THEN 100.0 -- Assuming all users start at step 1 (adjust if needed)
WHEN fs.funnel_step > 1 THEN
100.0 * (
COUNT(DISTINCT fs.user_pseudo_id) /
LAG(COUNT(DISTINCT fs.user_pseudo_id)) OVER (ORDER BY fs.event_date, fs.funnel_step)
)
ELSE 0.0
END AS drop_off_rate
FROM funnel_stages fs
GROUP BY fs.event_date, fs.funnel_step
ORDER BY fs.event_date, fs.funnel_step;
Explanation of Revised SQL Query:
-
funnel_stages CTE (Unchanged):
-
This CTE remains the same, assigning numerical values (1-5) to each stage based on the event name.
-
-
Main Query:
-
Similar to the initial approach, it retrieves data by event date and funnel step.
-
It retains the calculations for users_at_stage and total_conversions.
-
A new CASE statement calculates the drop-off rate:
-
For the first stage (step 1), it assumes a starting point of 100.0% (adjust if needed).
-
For subsequent stages (step > 1), it utilizes a window function LAG to access the user count at the previous stage. The drop-off rate is then calculated as a percentage by dividing the current stage user count by the previous stage user count and multiplying by 100.
-
For any other scenario (non-existent stages), the drop-off rate is set to 0.0.
-
-
Query Result in Excel Table Form:

Other Used Prompts for Contexual Understanding and Result Verification -
-
Show me a example query result of the SQL code with your proposed code.
-
Explain more about the LAG function.
-
Convert SQL Query result into Python code.