Commit graph

77 commits

Author SHA1 Message Date
John Mizerek
bace74aa23 Remove Uber Eats JSON-LD fast path — let Claude extract modifiers
The JSON-LD fast path only got items/categories/prices but no modifiers.
Removing it lets Uber Eats pages fall through to Claude AI extraction
which handles modifiers like every other platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:08:57 -07:00
John Mizerek
2cf5039c0f DoorDash modifiers: direct GraphQL API calls instead of clicking
Instead of clicking each menu item (broken by virtual scrolling),
extract item IDs from embedded JSON and make direct fetch() calls
to the itemPage GraphQL endpoint. 5 concurrent requests per batch.
Much faster and 100% reliable - no DOM interaction needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 13:11:03 -07:00
John Mizerek
f5974a5fa2 Improve DoorDash modifier extraction: pass item names to Playwright
- Pass extracted item names via temp JSON file so Playwright knows exactly
  what to click instead of guessing from DOM selectors (7 → 171 items)
- Use TreeWalker for exact text matching and aggressive scrolling
- Better price parsing: handle cents (int), dollars (string), displayPrice
- Improved modal dismissal with overlay click fallback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 13:02:27 -07:00
John Mizerek
b14f26ed47 Add DoorDash modifier extraction via stealth Playwright
- New doordash-modifiers.js: stealth Playwright script that clicks each
  menu item on a DoorDash page, captures itemPage GraphQL responses,
  and extracts optionLists (modifier groups with options and prices)
- Wire modifier extraction into DoorDash fast-path in analyzeMenuUrl.cfm:
  after parsing items/categories, runs modifier script and maps results
- Improved business info extraction: address, phone, and hours now use
  position-based parsing of StoreHeaderAddress, StoreHeaderPhoneNumber,
  and StoreOperationHoursRange embedded data (fixes intermittent missing info)
- Add playwright-extra and stealth plugin to package.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 12:58:00 -07:00
John Mizerek
8be3a3d802 Fix DoorDash image extraction: MenuPageItem uses imageUrl not imgUrl
StorePageCarouselItem uses imgUrl, but MenuPageItem uses imageUrl.
This gives ~540 item images instead of 26.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 12:32:01 -07:00
John Mizerek
a830a0820a Fix DoorDash parser: use find() loops instead of listToArray
listToArray treats delimiter as individual chars, not a string.
Rewritten to use position-based find() traversal for proper
multi-character delimiter splitting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 12:23:21 -07:00
John Mizerek
33040c9cd3 Rewrite DoorDash fast-path: use MenuPageItemList for full menu
- Extract items from MenuPageItemList (171 items) instead of StorePageCarouselItem (54)
- Categories already mapped to items via MenuPageItemList sections
- Cross-reference images from carousel entries by item name
- No need for Claude category assignment - data already structured

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 12:02:04 -07:00
John Mizerek
67e2079550 Add DoorDash/order.online fast-path parser
Extract menu data directly from embedded JSON in DoorDash HTML:
- Categories from MenuBookCategory entries
- Items with names, descriptions, prices, and image URLs from StorePageCarouselItem
- Business info from page title and StoreHeaderAddress
- Uses Claude to assign items to categories
- Upgrades image URLs to 600px for better quality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:44:24 -07:00
John Mizerek
f58d567fb4 Fix DoorDash image import: scroll page in Playwright to trigger lazy-loaded images
- Update render.js on dev server to scroll page before capturing images
- Increase Playwright wait from 4s to 5s and timeout from 90s to 120s
- Upsize DoorDash CDN thumbnails from 150px to 600px when downloading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 11:20:10 -07:00
John Mizerek
8ac6800d47 Use Jackson parser via file for Claude JSON response
Lucee's deserializeJSON fails on certain Claude outputs and the error
bypasses inner cftry/cfcatch. Parse via Jackson from file instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:37:42 -07:00
John Mizerek
a5c0d55aa8 Fix JSON error handler and save raw Claude response for debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:32:50 -07:00
John Mizerek
1ca958d11f Add debug char dump around JSON parse failure position
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:28:26 -07:00
John Mizerek
4240fe76cc Harden JSON parsing for Claude API responses
- Add smart quote/dash replacement for PDF-sourced text
- Add Jackson fallback parser for when Lucee's deserializeJSON fails
- Strengthen prompt to request properly escaped JSON
- Clean control characters more selectively

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:31:03 -07:00
John Mizerek
57d31c0428 Fix unescaped # in Uber Eats HTML entity unescaping
CFML was failing to compile analyzeMenuUrl.cfm because &#39; contains
a # character that Lucee interprets as variable expression start.
Escaped all 4 occurrences to &##39;.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 18:46:33 -07:00
John Mizerek
49d724f9b2 Add Uber Eats menu import and fix header image upload step
- Parse JSON-LD structured menu data from saved Uber Eats pages
  (categories, items, prices, descriptions, business info)
- Show save-and-upload instructions when user pastes Uber Eats URL
- Always show header image upload step (was skipped for URL imports)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 11:59:40 -07:00
John Mizerek
b0fa48ab64 Pass WooCommerce business info (name, address, phone) to wizard
The WooCommerce fast path was returning empty business info. Now the
Playwright script extracts it from the page and the CFML passes it through.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 20:18:17 -08:00
John Mizerek
8fd0ccb8da Include imageUrl on WooCommerce items for wizard image download
The wizard checks item.imageUrl to decide whether to skip the image
upload step and download images from remote URLs instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 16:47:06 -08:00
John Mizerek
fb92748784 Add WooCommerce fast-path with Playwright modifier extraction
Detects WooCommerce sites from Playwright HTML (woocommerce, wc-add-to-cart,
tm-extra-product-options). Runs woo-modifiers.js which navigates all product
pages, extracts items with categories, and scrapes TMEPO/variation modifiers.
Falls through to Claude if extraction fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 14:24:22 -08:00
John Mizerek
a44dfd79ae Fix every-item-as-category pattern in menu import
Post-process Claude menu extraction to detect when >60% of categories
have exactly 1 item (a common misparse). Collapses pseudo-categories
into the nearest preceding real (0-item) category.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 16:51:09 -08:00
John Mizerek
84985d98d8 Improve Claude prompt to distinguish categories from items
Added explicit guidance that categories are broad section headings
(5-15 typical) and items are individual products (30-150 typical).
Prevents Claude from treating each menu item as its own category.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 16:45:33 -08:00
John Mizerek
983ba7c2e4 Fix image media type detection from content, not extension
Claude API rejects images when the declared media type doesn't
match the actual content. Now detects JPEG/PNG/GIF/WebP from
base64 magic bytes instead of trusting file extensions or
Content-Type headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 14:33:39 -08:00
John Mizerek
c40e5c0181 Add Grubhub menu import via API
Detect Grubhub URLs and fetch menu data directly via their
REST API instead of scraping HTML. Gets anonymous auth token,
then fetches full restaurant data including categories, items,
modifiers, prices, hours, lat/lng, tax rate, and item images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 12:20:53 -08:00
John Mizerek
2f9bb2b869 Extract lat/lng from Toast and save directly to address
Toast provides latitude/longitude in the location object. Extract
in analyzeMenuUrl.cfm and pass through to saveWizard.cfm, which
now includes lat/lng in the address INSERT. Skips the background
Nominatim geocode when coordinates are already available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:47:32 -08:00
John Mizerek
2b441e166e Extract business hours from Toast schedule data
Parses upcomingSchedules from ROOT_QUERY.restaurantV2.schedule
and formats as "Mon 7:30am-6:30pm, Tue 7:30am-6:30pm" text string
that the setup wizard's parseHoursString() can consume.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 10:24:45 -08:00
John Mizerek
4351978c10 Fix Toast menu import for new URL format and prices array
- Add item.prices array support to first code path (was only checking
  item.price scalar, but Toast now uses prices: [4.50] array)
- Extract individual address fields (addressLine1, city, state, zip)
  from ROOT_QUERY restaurant data for saveWizard compatibility
- Update modifier extraction URL detection to match any toasttab.com
  domain (not just order.toasttab.com)
- Update slug-based URL construction to use www.toasttab.com/local/order/
  format instead of deprecated order.toasttab.com/online/ format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 19:45:28 -08:00
John Mizerek
3a9f952d8b Fix Toast __OO_STATE__ extraction: remove tag stripping and var keywords
Tag stripping via reReplace on 1.7M HTML was likely causing the silent
failure on biz server. Brace-counting doesn't need tag stripping since
HTML tags don't contain { or } and attribute quotes come in balanced
pairs. Also removed var keywords from page-level cfscript (may not be
supported in Lucee at template level) and added detailed error output
to the cfcatch for debugging.

Also auto-detect dev/prod environment from hostname instead of
hardcoded flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 20:28:02 -08:00
John Mizerek
e7aaae58b7 Replace regex extraction with brace-counting for __OO_STATE__
The regex .*? (non-greedy) fails on 500K+ JSON due to Java regex
backtracking limits, causing truncated data (only 3 of 6 menus
extracted). Replace all 3 extraction points with cfscript
brace-counting that reliably handles any JSON size.

Also decode HTML entities (&amp; -> &, &lt; -> <, etc.) from
Chrome View Source saves.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 19:18:17 -08:00
John Mizerek
22e89b2dd3 Fix __OO_STATE__ extraction for Chrome View Source HTML
Chrome's Ctrl+U (View Source) saves wraps content in <span> tags
which breaks the regex termination pattern ;\s*window\. because
HTML tags appear between ; and the next window. variable.

Strip HTML tags from a working copy before regex extraction when
View Source format is detected (presence of <span id="line" tags).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:45:34 -08:00
John Mizerek
4684936595 Add parent/child category hierarchy for Toast menus
Toast pages with multiple menus (e.g. "Food", "Beverages", "Merchandise")
now produce parent categories from the menu names with subcategories from
the groups within each menu, using the parentCategoryName field the wizard
already supports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:18:13 -08:00
John Mizerek
aca3ba18a1 Add Toast modifier extraction via Playwright
When analyzing Toast menu pages, items with modifiers now have their
modifier groups extracted by clicking each item in a headless browser
and intercepting the GraphQL MenuItemDetails responses. Extracted
modifiers include group name, required/optional flag, min/max selections,
and option names with prices. Items sharing the same itemGroupGuid
inherit modifiers from successfully mapped siblings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 17:48:48 -08:00
John Mizerek
95dc4c49fc Strip address from business name when Toast embeds it in the name field
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 11:12:58 -08:00
John Mizerek
e403e49487 Fix Toast OO_STATE: restaurant from ROOT_QUERY, prices from prices[]
- Restaurant info is in ROOT_QUERY.restaurantV2(...) keys, not
  Restaurant:* top-level keys (Apollo cache format)
- Prices are in item.prices array [4.50], not item.price scalar
- Added null checks for imageUrls (can be null, not missing)
- Fallback to title tag for business name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 11:11:44 -08:00
John Mizerek
a0d86d6e87 Add Toast __OO_STATE__ fast-path for URL-fetched menu pages
Instead of sending 450KB of HTML to Claude (which truncates to 100K
and only extracts ~60 items), parse the structured __OO_STATE__ data
directly on the server. This captures all menus, groups, items, prices,
and images from Toast pages - 169 items for Jus Family Cafe vs 60 before.

Falls back to Claude analysis if __OO_STATE__ parsing fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 11:08:07 -08:00
John Mizerek
ced4082993 Fix JSON parsing when Claude returns text preamble before menu JSON
The Claude API sometimes returns explanatory text before the JSON
response even when instructed to return only JSON. Added extraction
logic to find the first { character and strip any leading text.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 10:30:21 -08:00
John Mizerek
9acf4aa511 Add server-side h2/h3 hierarchy detection for subcategory discovery
- Parse HTML heading structure to detect h2 parents with h3 subcategories
- Append detected hierarchy to Claude prompt as explicit hint
- Post-process Claude response to enforce hierarchy even if Claude returns flat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:36:36 -08:00
John Mizerek
495b03c76d Add subcategory detection to wizard URL analyzer and display
- analyzeMenuUrl.cfm: Detect subcategories from Toast subgroups and
  Claude API responses, preserve hierarchy with parentCategoryName
- setup-wizard.html: Display subcategories indented under parents
  throughout wizard flow (categories step, items review, summary, preview)
- menu-builder.html: Show subcategories nested in outline modal view

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:08:59 -08:00
John Mizerek
e02e124610 Increase max_tokens to 16384 for menu URL analysis
Large menus (20+ categories) were getting truncated JSON responses
at 8192 tokens, causing parse failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 16:17:13 -08:00
John Mizerek
b360284e56 Beacon delete fix, price extraction, tax rate lookup, add modifiers form 2026-02-14 19:17:48 -08:00
John Mizerek
3cd7bbb8b7 Fix tax rate lookup and add price extraction from __OO_STATE__
- Tax rate: Use Zippopotam (free, no key) to get state, then lookup
  from built-in state+local rate tables instead of API Ninjas
- Prices: Extract prices from Toast __OO_STATE__ MenuItem objects
  when visible HTML prices are missing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 12:05:24 -08:00
John Mizerek
26e5d92a03 Improve image analysis prompt - be more explicit about extracting all visible business info 2026-02-13 10:54:11 -08:00
John Mizerek
abf6965614 Image data overwrites HTML-extracted data (more reliable) 2026-02-13 10:53:48 -08:00
John Mizerek
1432d8e2b8 Use ## to escape hash in CFML string 2026-02-13 10:47:12 -08:00
John Mizerek
ba017348b0 Fix CFML syntax error - escape # in string 2026-02-13 10:46:32 -08:00
John Mizerek
aa447bd009 Fix extractDir path detection for ZIP scanning
- Extract UUID folder path from URL instead of using getDirectoryFromPath
- Old logic was broken: listLast on path ending with / returned empty string
- This caused the code to go up one level too far

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 10:32:18 -08:00
John Mizerek
f9bfbc8960 Analyze images in ZIP for business info
- Scan extracted ZIP for image files (jpg, png, gif, webp)
- Skip small files (<10KB, likely icons) and _files folder assets
- Send up to 3 images to Claude for business info extraction
- Merge extracted name, address, phone, hours, brandColor
- Only fills in fields not already found from HTML

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 10:16:18 -08:00
John Mizerek
cf34636879 Scan all HTML files in ZIP for business info
- Extract directory and scan all .htm/.html files recursively
- Look for business name in title tags (skip generic titles)
- Extract street addresses with regex patterns
- Extract phone numbers
- Check __OO_STATE__ in other pages for Restaurant data
- Merge found info into toastBusiness (first found wins)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 10:13:13 -08:00
John Mizerek
90ed78fa96 Fix: Extract categories from __OO_STATE__ groups
The __OO_STATE__ parsing was only extracting images, not the group names
as categories. Now extracts category names from menu.groups and maps
items to their proper categories.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 10:06:43 -08:00
John Mizerek
09e5807c94 Fix: Add default 'Menu' category when no categories found
Toast extraction was finding items but no h2.groupHeader categories,
leaving items ungrouped. showItemsStep() then rendered no checkboxes,
and confirmItems() filtered out all items (empty checkedIds set).

Now adds a default "Menu" category when items exist but categories is empty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 09:35:50 -08:00
John Mizerek
b081e72347 Improve business info extraction from saved Toast pages
Added multiple fallback methods to extract business name:
1. Title tag with Toast-specific parsing
2. og:title and og:site_name meta tags
3. Header elements with restaurant/location classes
4. First h1 tag as last resort

Also added address and phone extraction from visible HTML.
Added summary logging of business info keys found.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 09:26:37 -08:00
John Mizerek
eec44011f4 Add more debug logging for title and OO_STATE extraction
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-13 09:21:47 -08:00