Same data. Same question. Different semantic context.
These four examples use static synthetic CSVs from the second `to-prompt` evaluation. Copy a question and its CSV files into your own model session, then compare the raw answer with an answer grounded by the linked rawctx package.
Result summarySemantic context mattered most when the dataset had competing metric columns or lifecycle facts.
Selected by impact score, oracle-quality delta, and semantic change size. This page keeps the public result summary, reusable questions, and CSVs, without operational run logistics.
Top examples
Four packages with the clearest reusable test cases.
The raw answer grouped first-user campaign performance through firstUserSource and firstUserCampaign, then calculated session-source performance separately.
After
The grounded answer used User.newVsReturning for the first-user segment and kept Session.sessionSource, Session.sessionCampaignName, and Ecommerce.ecommercePurchases in their session and purchase scopes.
Strong shift
Shopify Orders
Signal
Impact 3/4: material metric and numeric shift
Quality
Oracle quality 6 -> 9
Before
The raw answer used totalPriceAmount for GMV and AOV, producing larger GMV figures such as web 165 and retail 290.
After
The grounded answer followed the package metric definition and used currentTotalPriceAmount, producing web 150, retail 280, and mobile 120 for GMV.
Directional shift
HubSpot Marketing
Signal
Impact 3/4 in the material-shift case
Quality
Oracle quality 8 -> 10
Before
The raw answer was unstable: it sometimes used the precomputed email_*_rate fields and sometimes recalculated rates from raw counts and emails_delivered.
After
The grounded answer consistently used email_open_rate, email_click_rate, email_bounce_rate, email_unsubscribe_rate, and email_spam_report_rate.
Directional shift
Salesforce Revenue Usage
Signal
Impact 3/4 in the material-shift case
Quality
Oracle quality 8 -> 10
Before
The raw answer sometimes added commitment 1,000 and overage 240 together, presenting 1,240 as the billing amount.
After
The grounded answer treated UsageRatableSummary.TotalAmount = 240 as the final overage charge and kept the commitment policy as a separate lifecycle caveat.
Strong shift
GA4 Data API
Context changed first-user analysis from decoy source/campaign fields toward User.newVsReturning and session-scoped attribution.
Using the attached synthetic GA4 dataset, compare first-user segments with session source/campaign performance, then calculate purchase conversions and purchase revenue.
Using the attached Shopify orders, fulfillments, and abandoned checkouts dataset, calculate GMV, AOV, refund rate, cart abandonment rate, and fulfillment time by sales channel.
Using the attached Salesforce Revenue Usage dataset, model account A1's usage-based billing flow through entitlement, bucket, usage summary, overage, commitment, and ratable charge, then calculate the amount.