In a world where artificial intelligence increasingly requires cloud connectivity, a fundamental question emerges: can mobile assistants understand natural language and execute actions without sending data off the device? At Levi9, a recent prototype demonstrates not only that it’s possible, but also reveals the real-world constraints and opportunities of on-device AI.
The Challenge: Intelligence That Respects Privacy
The Conversational UI prototype emerged from a clear objective: test how far local-only LLMs on phones can go, primarily for privacy so no data leaves the device, and secondarily for weak or spotty connectivity scenarios.
The goal was to create an assistant that understands natural language and turns it into valid, executable JSON actions. Instead of tapping through a calendar, users can say “Schedule a meeting tomorrow after lunch” or “Book it in the second half of the week,” and the system resolves dates, times, and creates the event – all locally.
Because that level of understanding requires capable models, the effort evaluated what is credibly doable on-device today on both Android and iOS, with a toggleable demo used where feasible to illustrate current capability.
Honest Assessment: What Works and What Doesn't
What distinguishes this prototype is its transparent evaluation of current on-device AI capabilities. The team faced different challenges on each platform.
On Android, smaller quantized models proved unreliable for robust natural-language understanding to structured actions, particularly with temporal phrases like “after lunch” or ranges like “second half of the week.” Larger models required flagship-class hardware and still showed thermal throttling and uneven throughput. The conclusion: technically feasible, but not dependable for a broad audience at that time.
On iOS, with iOS 26 in preview, the on-device model enabled a working local demo. It demonstrated on-device understanding and action generation, but under sustained use increased system load and heat, showing slower, less consistent performance than would be acceptable for production. Strong as a proof of concept, but not yet production-ready.
The work occurred before iOS 26 reached general availability and during the early AICore and Gemini Nano rollout on Android. A re-evaluation on official releases is planned, acknowledging that the platform landscape is evolving quickly.
Built for Privacy: Fully Offline by Design
This prototype differs fundamentally from other digital assistant solutions on the market: it is fully offline and on-device by design. The assistant is built to understand natural language and perform actions locally, so data stays on the phone. An online path appeared only in the prototype to compare execution speed – it was included purely as a benchmark, not as the default operating mode.
The prototype addresses concrete business needs across several areas:
- Privacy-sensitive actions can now be handled locally. Convert short voice or text commands like “tomorrow after lunch” or “later this week” into on-device events, reminders, drafts, and quick utilities – without sending data to the cloud.
- Faster completion eliminates UI navigation. Users get things done quicker without clicking through screens; they say what’s needed and the action executes.
- No need to know the app layout means users don’t have to remember where a feature lives – the assistant routes the intent to the right place.
- Time savings for busy teams reduce micro-friction and context switching, especially on mobile.
- The system resolves minor dependencies by interpreting natural phrasing, filling in missing details, and preparing actionable items.
- Low connectivity support ensures core tasks continue to function offline when coverage dips.
Technical Architecture: From Natural Language to Action
The prototype leverages different approaches based on platform capabilities.
For local, on-device processing on Android, multiple quantized LLMs were evaluated, though reliability for broad natural language understanding to JSON was insufficient without flagship-class devices, with performance and thermal variability. On iOS, the iOS 26 preview Foundation model enabled on-device natural language understanding and actioning-solid for a proof of concept, though not production-ready at that time.
An online path using a cloud LLM served purely as a baseline for speed, longer context, and nuanced temporal reasoning – not the product intent.
The action layer translates intent into strict JSON schema, then validated tool execution for calendar, messaging drafts, and utilities, with lightweight retries and schema enforcement.
What This Demonstrates About AI Capabilities
This project showcases several dimensions of AI expertise:
- Privacy-by-design leadership ensures natural-language understanding and actioning run fully on-device, minimizing data exposure by default.
- Evidence-driven practice created a repeatable evaluation comparing local versus alternative modes, measuring latency, reliability, thermals, and error rates to inform decisions.
- Cross-platform depth advanced work on Android and iOS in parallel, using cross-platform UI for velocity and native integrations where deeper OS hooks were required.
- Product-level behavior moves beyond toy chat applications – the prototype performs real phone actions including events, reminders, drafts, and utilities with clear validation rules.
- Mastery of performance trade-offs mapped the limits of small versus large models on commodity hardware, defining when local processing is viable and when it isn’t.
- Security embedded in architecture means the local natural language understanding to structured actions path avoids sending user data to the cloud for common tasks.
- Release-timing literacy acknowledges that work occurred before iOS 26 general availability and during early AICore and Gemini Nano rollout on Android, with re-evaluation on official releases planned to reflect fast-moving platforms.
- Operational realism in model handling, validation, and clear UX semantics demonstrates the ability to translate R&D into ship-ready constraints and user-friendly experiences.
How does the team explain this to someone new to AI or conversational interfaces? Simply: think of it as a personal secretary in your pocket. Speak naturally – “Schedule a meeting tomorrow after lunch,” “Remind me later this week” – and it does the job directly on the phone, turning words into calendar events, reminders, and quick drafts while resolving small obstacles like avoiding overlapping events. No accounts to trust, no data shipped off: it’s fully on-device, privacy-first, and works even without a connection. In short: say it, it understands, and it gets done – locally.
The Path Forward
This prototype represents more than a technical experiment. It demonstrates a methodical approach to evaluating emerging AI capabilities on mobile platforms, honestly assessing both possibilities and limitations, and building toward solutions that prioritize user privacy without sacrificing functionality.
As on-device AI capabilities continue to evolve with each platform release, Levi9’s evidence-driven approach and cross-platform expertise position the company to deliver privacy-respecting, locally-intelligent mobile experiences that meet real business needs.
***This article is part of the AI9 series, where we walk the talk on AI innovation.***
In this article:
Mobile Developers @Levi9 Serbia





