
Building SQL Queries with Gemini AI
Background and Objectives:
Hey recruiters and potential managers! In this new AI world we are living, here this portfolio aims to demonstrate my proficiency in leveraging Gemini AI, a large language model from Google, to construct effective SQL queries through prompts. I only posses basic SQL knowledge however had been challenged with data analytics tasks that are beyond my level. This portfolio highlights my ability to:
-
Craft clear and concise prompts to obtain desired information.
-
Refine prompts iteratively to achieve the exact analytical outcome.
-
Understand and interpret the results of SQL queries.
​
1. Construct a Order Attribution Report with 3 days Lookback Window
Desired Outcome:
Obtain a report that details the user journey leading to a purchase event within a specific timeframe (lookback window of three days). This report should include user IDs, event dates, traffic sources, and the total purchase value and event names other than purchase under the lookback windown period.
​
Initial Prompt:
"I am building an order attribution report with a 3 day lookback window. Every user_id has their event_name recorded (event name include page view, purchase, download and others) recorded with their own timestamp. I want to query their rows of each of different users purchase timestamp with a lookback window of three days. How can I query on different purchase dates of 3 days window from each user?"
​
Understanding the Response:
Assuming your data resides in a table named sample_events_date, Gemini might provide a sample SQL code similar to this:
​SELECT
user_pseudo_id,
event_date,
traffic_source_source,
traffic_source_medium,
event_value_in_usd
FROM
sample_events_date
WHERE
event_name = 'purchase'
AND event_timestamp >= MAX(event_timestamp) - INTERVAL 3 DAY;
Iterative Refinement:
While this initial query captures purchases, it doesn't show the entire user journey. I might need to rephrase the prompts to include all relevant events within the lookback window:
Rephrased Prompt:
"Instead of just showing purchases, can you show all user events (including page views, add-to-carts, etc.) within the last 3 days for users who made a purchase during that timeframe? I want to see the entire user journey leading to the purchase."
Understanding the Revised Response:
Gemini would likely provide a revised SQL code that incorporates filtering based on the purchase event within the lookback window and includes other relevant events. It also provides code explanations on each steps.
​
SELECT
user_pseudo_id,
event_date,
traffic_source_source,
traffic_source_medium,
event_value_in_usd
FROM
sample_events_date
WHERE
user_pseudo_id IN (
SELECT user_pseudo_id
FROM sample_events_date
WHERE event_name = 'purchase'
AND event_timestamp >= MAX(event_timestamp) - INTERVAL 3 DAY
)
AND event_date >= DATE_SUB(
(SELECT MAX(event_date)
FROM sample_events_date
WHERE user_pseudo_id = sample_events_date.user_pseudo_id),
INTERVAL 7 DAY
)
AND event_name IN ('first_visit', 'session_start', 'page_view', 'view_item', 'add_to_cart', 'begin_checkout', 'purchase')
ORDER BY
user_pseudo_id, event_date;
Expected SQL Query Result:
​
Utilizing Gemini AI & Prompt Design
for Building SQL Queries
2024

Other Prompts Used for Context Understanding & Result Verification:
There has been a lot of unintended result provided by Gemini AI which may due to poor prompt reply. To verify and gauge whether the query result provided by Gemini is what I am expecting, few other prompts are used for verification and confirmation:
​
-
Does this code include query of the event_name (page views, download) 3 days before the purchase_date?
-
Why the CURDATE() is used on your SQL code. Every event name has their own timestamp and does not seem like it is relevant to current date.
-
Tell me about order attribution report and specify its definition.
-
Tell me more about this SQL code AND event_timestamp >=MAX(event_timestamp) -INTERVAL3DAYFROM your_table AS p2 WHERE p2.user_id = u.user_id
-
Is 1685436253234280 timestamp data? If it is, please convert it into human-readable format in SQL.
-
Please provide me a breakdown of this SQL code and its functions.
2. Closed Funnel Analysis (Cont')
Desired Outcome:
Analyze user journeys within the purchase funnel, calculating the conversion rate at each stage (view product, add to cart, checkout, purchase) and the drop-off rate between stages.
Initial Approach:
While the previous sections explored order attribution report, this section delves deeper by calculating drop-off rates. Initial prompt: "Using the same data for order attribution report, compose a SQL query that delivers a closed funnel analysis, broken down by event_date."
Understanding the Response:
Gemini might suggest a new SQL code structure utilizing a Common Table Expression (CTE) and user-defined logic to assign funnel step values and calculate the total number of users and conversions at each stage.
​
WITH funnel_stages AS (
SELECT
user_pseudo_id,
event_date,
event_name,
CASE
WHEN event_name = 'first_visit' THEN 1
WHEN event_name = 'session_start' THEN 1 -- Assuming session_start also represents funnel entry
WHEN event_name = 'page_view' THEN 1 -- Assuming page_view also represents funnel entry
WHEN event_name = 'view_item' THEN 2
WHEN event_name = 'add_to_cart' THEN 3
WHEN event_name = 'begin_checkout' THEN 4
WHEN event_name = 'purchase' THEN 5
ELSE 0
END AS funnel_step
FROM sample_events_date
)
SELECT
fs.event_date,
fs.funnel_step,
COUNT(DISTINCT fs.user_pseudo_id) AS users_at_stage,
SUM(CASE WHEN fs.funnel_step = 5 THEN 1 ELSE 0 END) AS total_conversions
FROM funnel_stages fs
GROUP BY fs.event_date, fs.funnel_step
ORDER BY fs.event_date, fs.funnel_step;
Iterative Refinement:
While this initial query captures the funnel steps, breaking down by event date and steps. My initial prompt lacks mention on the drop off rate on each funnel step
Rephrased Prompt:
"Add one new column that stated the percentage drop rate on each funnel step."
WITH funnel_stages AS (
SELECT
user_pseudo_id,
event_date,
event_name,
CASE
WHEN event_name = 'first_visit' THEN 1
WHEN event_name = 'session_start' THEN 1
WHEN event_name = 'page_view' THEN 1
WHEN event_name = 'view_item' THEN 2
WHEN event_name = 'add_to_cart' THEN 3
WHEN event_name = 'begin_checkout' THEN 4
WHEN event_name = 'purchase' THEN 5
ELSE 0
END AS funnel_step
FROM sample_events_date
)
SELECT
fs.event_date,
fs.funnel_step,
COUNT(DISTINCT fs.user_pseudo_id) AS users_at_stage,
SUM(CASE WHEN fs.funnel_step = 5 THEN 1 ELSE 0 END) AS total_conversions,
CASE
WHEN fs.funnel_step = 1 THEN 100.0 -- Assuming all users start at step 1 (adjust if needed)
WHEN fs.funnel_step > 1 THEN
100.0 * (
COUNT(DISTINCT fs.user_pseudo_id) /
LAG(COUNT(DISTINCT fs.user_pseudo_id)) OVER (ORDER BY fs.event_date, fs.funnel_step)
)
ELSE 0.0
END AS drop_off_rate
FROM funnel_stages fs
GROUP BY fs.event_date, fs.funnel_step
ORDER BY fs.event_date, fs.funnel_step;
Explanation of Revised Code:
-
funnel_stages CTE (Unchanged):
-
This CTE remains the same, assigning numerical values (1-5) to each stage based on the event name.
-
-
Main Query:
-
Similar to the initial approach, it retrieves data by event date and funnel step.
-
It retains the calculations for users_at_stage and total_conversions.
-
A new CASE statement calculates the drop-off rate:
-
For the first stage (step 1), it assumes a starting point of 100.0% (adjust if needed).
-
For subsequent stages (step > 1), it utilizes a window function LAG to access the user count at the previous stage. The drop-off rate is then calculated as a percentage by dividing the current stage user count by the previous stage user count and multiplying by 100.
-
For any other scenario (non-existent stages), the drop-off rate is set to 0.0.
-
-

Expected SQL Query Result:
Other Prompts Used for Context Understanding & Result Verification:
-
Show me a example query result of the SQL code with your proposed code.
-
Explain more about the LAG function.
-
Convert into Python.