When analyzing Toast menu pages, items with modifiers now have their
modifier groups extracted by clicking each item in a headless browser
and intercepting the GraphQL MenuItemDetails responses. Extracted
modifiers include group name, required/optional flag, min/max selections,
and option names with prices. Items sharing the same itemGroupGuid
inherit modifiers from successfully mapped siblings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restaurant info is in ROOT_QUERY.restaurantV2(...) keys, not
Restaurant:* top-level keys (Apollo cache format)
- Prices are in item.prices array [4.50], not item.price scalar
- Added null checks for imageUrls (can be null, not missing)
- Fallback to title tag for business name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of sending 450KB of HTML to Claude (which truncates to 100K
and only extracts ~60 items), parse the structured __OO_STATE__ data
directly on the server. This captures all menus, groups, items, prices,
and images from Toast pages - 169 items for Jus Family Cafe vs 60 before.
Falls back to Claude analysis if __OO_STATE__ parsing fails.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude API sometimes returns explanatory text before the JSON
response even when instructed to return only JSON. Added extraction
logic to find the first { character and strip any leading text.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse HTML heading structure to detect h2 parents with h3 subcategories
- Append detected hierarchy to Claude prompt as explicit hint
- Post-process Claude response to enforce hierarchy even if Claude returns flat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- analyzeMenuUrl.cfm: Detect subcategories from Toast subgroups and
Claude API responses, preserve hierarchy with parentCategoryName
- setup-wizard.html: Display subcategories indented under parents
throughout wizard flow (categories step, items review, summary, preview)
- menu-builder.html: Show subcategories nested in outline modal view
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Large menus (20+ categories) were getting truncated JSON responses
at 8192 tokens, causing parse failures.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Tax rate: Use Zippopotam (free, no key) to get state, then lookup
from built-in state+local rate tables instead of API Ninjas
- Prices: Extract prices from Toast __OO_STATE__ MenuItem objects
when visible HTML prices are missing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract UUID folder path from URL instead of using getDirectoryFromPath
- Old logic was broken: listLast on path ending with / returned empty string
- This caused the code to go up one level too far
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Scan extracted ZIP for image files (jpg, png, gif, webp)
- Skip small files (<10KB, likely icons) and _files folder assets
- Send up to 3 images to Claude for business info extraction
- Merge extracted name, address, phone, hours, brandColor
- Only fills in fields not already found from HTML
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract directory and scan all .htm/.html files recursively
- Look for business name in title tags (skip generic titles)
- Extract street addresses with regex patterns
- Extract phone numbers
- Check __OO_STATE__ in other pages for Restaurant data
- Merge found info into toastBusiness (first found wins)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The __OO_STATE__ parsing was only extracting images, not the group names
as categories. Now extracts category names from menu.groups and maps
items to their proper categories.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Toast extraction was finding items but no h2.groupHeader categories,
leaving items ungrouped. showItemsStep() then rendered no checkboxes,
and confirmItems() filtered out all items (empty checkedIds set).
Now adds a default "Menu" category when items exist but categories is empty.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added multiple fallback methods to extract business name:
1. Title tag with Toast-specific parsing
2. og:title and og:site_name meta tags
3. Header elements with restaurant/location classes
4. First h1 tag as last resort
Also added address and phone extraction from visible HTML.
Added summary logging of business info keys found.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log all top-level keys in __OO_STATE__ to diagnose why Restaurant
key isn't being found
- Extract business name from HTML title tag as fallback
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Look for Restaurant: keys and extract name, location (address, city,
state, zip), phone, and brandColor for the wizard business info step.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Define basePath before Toast parsing block so image URLs can be
properly constructed for local file uploads.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract items from visible HTML instead of just __OO_STATE__ JSON
- Parse headerText spans for item names, price spans for prices
- Extract images from Menu_files/ src attributes
- Fall back to simpler headerText matching if block parsing fails
- Also extract images from __OO_STATE__ and match to items by name
- Fixes issue where only 116 items extracted instead of 163+
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip Claude AI for Toast menus - parse the embedded JSON directly.
This extracts all items, categories, and images from the structured
__OO_STATE__ data, which is faster and more complete than AI extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Claude returns imageUrl but code only checked for images/imageSrc.
Add handling for imageUrl field to properly match images to items.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- uploadSavedPage.cfm: sanitize extracted files (whitelist safe extensions,
delete symlinks) to protect against malicious content from infected sites
- analyzeMenuUrl.cfm: detect local temp URLs and read directly from disk,
bypassing Playwright for faster processing of saved pages
- saveWizard.cfm: delete temp folder immediately after wizard completes
instead of waiting for 1-hour auto-cleanup
- setup-wizard.html: track temp folder ID and pass to saveWizard for cleanup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When scanning extracted ZIP content from /temp/menu-import/, read
images directly from the filesystem instead of re-downloading via HTTP.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace cfhttp with Playwright headless browser
- Capture images from network requests during page render
- No longer needs to fetch subpages (JS renders everything)
- Should capture subcategory items that load dynamically
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Backend now accepts either url or html content in request body
- Frontend adds HTML file upload option below URL input
- Useful when websites block the crawler (403 errors)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>