- Extract UUID folder path from URL instead of using getDirectoryFromPath
- Old logic was broken: listLast on path ending with / returned empty string
- This caused the code to go up one level too far
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Scan extracted ZIP for image files (jpg, png, gif, webp)
- Skip small files (<10KB, likely icons) and _files folder assets
- Send up to 3 images to Claude for business info extraction
- Merge extracted name, address, phone, hours, brandColor
- Only fills in fields not already found from HTML
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract directory and scan all .htm/.html files recursively
- Look for business name in title tags (skip generic titles)
- Extract street addresses with regex patterns
- Extract phone numbers
- Check __OO_STATE__ in other pages for Restaurant data
- Merge found info into toastBusiness (first found wins)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The __OO_STATE__ parsing was only extracting images, not the group names
as categories. Now extracts category names from menu.groups and maps
items to their proper categories.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Toast extraction was finding items but no h2.groupHeader categories,
leaving items ungrouped. showItemsStep() then rendered no checkboxes,
and confirmItems() filtered out all items (empty checkedIds set).
Now adds a default "Menu" category when items exist but categories is empty.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added multiple fallback methods to extract business name:
1. Title tag with Toast-specific parsing
2. og:title and og:site_name meta tags
3. Header elements with restaurant/location classes
4. First h1 tag as last resort
Also added address and phone extraction from visible HTML.
Added summary logging of business info keys found.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log all top-level keys in __OO_STATE__ to diagnose why Restaurant
key isn't being found
- Extract business name from HTML title tag as fallback
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Look for Restaurant: keys and extract name, location (address, city,
state, zip), phone, and brandColor for the wizard business info step.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Define basePath before Toast parsing block so image URLs can be
properly constructed for local file uploads.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract items from visible HTML instead of just __OO_STATE__ JSON
- Parse headerText spans for item names, price spans for prices
- Extract images from Menu_files/ src attributes
- Fall back to simpler headerText matching if block parsing fails
- Also extract images from __OO_STATE__ and match to items by name
- Fixes issue where only 116 items extracted instead of 163+
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip Claude AI for Toast menus - parse the embedded JSON directly.
This extracts all items, categories, and images from the structured
__OO_STATE__ data, which is faster and more complete than AI extraction.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Claude returns imageUrl but code only checked for images/imageSrc.
Add handling for imageUrl field to properly match images to items.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- uploadSavedPage.cfm: sanitize extracted files (whitelist safe extensions,
delete symlinks) to protect against malicious content from infected sites
- analyzeMenuUrl.cfm: detect local temp URLs and read directly from disk,
bypassing Playwright for faster processing of saved pages
- saveWizard.cfm: delete temp folder immediately after wizard completes
instead of waiting for 1-hour auto-cleanup
- setup-wizard.html: track temp folder ID and pass to saveWizard for cleanup
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Check X-Forwarded-Proto header for HTTPS (reverse proxy)
- chmod extracted files to be world-readable for Playwright
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security: Also added nginx rule on dev server to block CFM/PHP
execution in /temp/menu-import/ directory.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When scanning extracted ZIP content from /temp/menu-import/, read
images directly from the filesystem instead of re-downloading via HTTP.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For Cloudflare-protected sites, users can now:
1. Save the page from their browser (Webpage, Complete)
2. ZIP the HTML and assets folder
3. Upload the ZIP in the wizard
4. Server extracts to temp folder, Playwright scans local copy
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace cfhttp with Playwright headless browser
- Capture images from network requests during page render
- No longer needs to fetch subpages (JS renders everything)
- Should capture subcategory items that load dynamically
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Backend now accepts either url or html content in request body
- Frontend adds HTML file upload option below URL input
- Useful when websites block the crawler (403 errors)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>