- Extract directory and scan all .htm/.html files recursively
- Look for business name in title tags (skip generic titles)
- Extract street addresses with regex patterns
- Extract phone numbers
- Check __OO_STATE__ in other pages for Restaurant data
- Merge found info into toastBusiness (first found wins)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The __OO_STATE__ parsing was only extracting images, not the group names
as categories. Now extracts category names from menu.groups and maps
items to their proper categories.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Toast extraction was finding items but no h2.groupHeader categories,
leaving items ungrouped. showItemsStep() then rendered no checkboxes,
and confirmItems() filtered out all items (empty checkedIds set).
Now adds a default "Menu" category when items exist but categories is empty.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added multiple fallback methods to extract business name:
1. Title tag with Toast-specific parsing
2. og:title and og:site_name meta tags
3. Header elements with restaurant/location classes
4. First h1 tag as last resort
Also added address and phone extraction from visible HTML.
Added summary logging of business info keys found.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log all top-level keys in __OO_STATE__ to diagnose why Restaurant
key isn't being found
- Extract business name from HTML title tag as fallback
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Look for Restaurant: keys and extract name, location (address, city,
state, zip), phone, and brandColor for the wizard business info step.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Define basePath before Toast parsing block so image URLs can be
properly constructed for local file uploads.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract items from visible HTML instead of just __OO_STATE__ JSON
- Parse headerText spans for item names, price spans for prices
- Extract images from Menu_files/ src attributes
- Fall back to simpler headerText matching if block parsing fails
- Also extract images from __OO_STATE__ and match to items by name
- Fixes issue where only 116 items extracted instead of 163+
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip Claude AI for Toast menus - parse the embedded JSON directly.
This extracts all items, categories, and images from the structured
__OO_STATE__ data, which is faster and more complete than AI extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Claude returns imageUrl but code only checked for images/imageSrc.
Add handling for imageUrl field to properly match images to items.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- uploadSavedPage.cfm: sanitize extracted files (whitelist safe extensions,
delete symlinks) to protect against malicious content from infected sites
- analyzeMenuUrl.cfm: detect local temp URLs and read directly from disk,
bypassing Playwright for faster processing of saved pages
- saveWizard.cfm: delete temp folder immediately after wizard completes
instead of waiting for 1-hour auto-cleanup
- setup-wizard.html: track temp folder ID and pass to saveWizard for cleanup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When scanning extracted ZIP content from /temp/menu-import/, read
images directly from the filesystem instead of re-downloading via HTTP.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace cfhttp with Playwright headless browser
- Capture images from network requests during page render
- No longer needs to fetch subpages (JS renders everything)
- Should capture subcategory items that load dynamically
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Backend now accepts either url or html content in request body
- Frontend adds HTML file upload option below URL input
- Useful when websites block the crawler (403 errors)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>