What it really takes to build a product inside ChatGPT while the ecosystem is still forming (and without an engineering team)

Building in a space that doesn’t exist yet
A ChatGPT app isn’t an app in the way we’ve learned to think about software.
It doesn’t ship with a standalone interface or a familiar navigation model. It lives inside a conversational surface, where language, intent, and orchestration shape how a product is experienced just as much as UI does. Still, companies are moving quickly to integrate, sensing that showing up inside these systems will soon be table stakes for discovery.
The moment feels familiar. In the early 2010s, teams rushed to become mobile-first before fully understanding what mobile would demand of product, design, or infrastructure. Large language models are triggering a similar shift — except this time the interface is conversational, the distribution layer is centralized, and many of the constraints are still undefined.
Participating in this new layer requires more than exposing an API. Products need a way to express what they do, how they should be used, and under which conditions they should be invoked. MCPs are one emerging attempt to make products legible to language models, though the standard itself is still taking shape.

What stands out most is how early this ecosystem feels.
Tools are appearing faster than shared mental models. Founders are building in parallel toward very different interpretations of what a “ChatGPT integration” actually is. For non-technical builders — designers, product thinkers, operators — the promise of access exists alongside real uncertainty about control, responsibility, and realism.
This article is a set of field notes from trying to build a ChatGPT app early, without a technical background, while the space itself is still in flux.
A platform shift that pulled me in
I started paying attention to this space out of curiosity more than mandate.
When OpenAI released the ChatGPT app ecosystem to direct submissions in Dec 2025, it felt like the opening of a new product surface — one that would quietly reshape how discovery works and how products are understood. Conversational interfaces were no longer just answering questions; they were influencing what users saw, compared, and acted on.

It was obvious that surfacing inside ChatGPT would matter, and quickly. Not as a simple presence, but as a product that could be interpreted, ranked, and differentiated within a conversational flow.
I wanted to understand what that actually meant in practice.
Rather than speculating from the outside, I started exploring the space by trying to build something concrete: a prototype that behaved like a real app inside ChatGPT, constrained by its strict UX & UI guidelines, its invocation patterns, and its rules.

That exploration pulled me into an ecosystem I didn’t fully understand yet, but one that felt important to experience from the inside.
The constraint: moving fast without being technical
I approached this as a product designer — someone whose value comes from shaping behavior and orchestrating systems, not writing production code. That constraint became clarifying: it forced me to focus on the product questions that actually mattered. At the start, my mental model was incomplete: I didn’t yet understand where code actually lived, how APIs differed from MCPs, or how invocation worked once a product entered a conversational system.
What I did have was urgency, and a strong drive to understand the space.
Waiting for engineering availability wasn’t realistic. No-code tools became leverage — a way to maintain velocity while the ecosystem was still forming.
I expected the process to be relatively fluid. That these tools would absorb complexity and make experimentation easier.
Instead, I encountered a fast-growing ecosystem full of ambition, pressure, and partial solutions. Many platforms promised to bridge the gap between non-technical builders and ChatGPT integration. Very few actually reduced the friction involved.
Entering the frontier: a fragmented ecosystem
My exploration started with a straightforward question: which tools could realistically help me export an MCP and demo a functioning ChatGPT app?
I did my due diligences and experimented first with general no-code and AI-assisted platforms like Cursor and Lovable in order to get a reference.
Cursor, while powerful, assumed familiarity with local development environments, file systems, and publishing semantics that weren’t obvious without a development background. “No-code” didn’t remove complexity — it required architectural intuition most designers simply do not have. The recent emergence of various courses & tutorials explicitly aimed at making Cursor more approachable to designers reinforces that this gap is structural, not personal.

Lovable, as I had rightfully assumed, assisted in creating beautiful and realistic enough visuals when carefully prompted but never could extend beyond the role of ingenious prototype.
I then quickly moved on to exploring several early platforms attempting to solve this problem specifically, including Chippy, Fractal, NoodleSeed and Manifest.

In conversations with Colin Matthews, founder of Chippy and instructor at Maven, he described the value as enabling teams to “export code, view sharable specs, and run evals directly within the platform to facilitate cross-functional handoffs.” That made sense for what I was trying to do — bridging design intent and implementation without owning production code.
Each of these emerging tool approached the problem differently. Some were strong on prototyping but struggled when pushed toward something deployable. Others required a level of technical investment that conflicted with my constraints. A few were promising but blocked by access limitations, incomplete products, or early-stage instability.
Nearly everything I tried was in beta.
Platforms were buggy. Context was frequently lost. Documentation was thin. Many tools assumed users already knew how to prompt effectively, how to reason about conversational flows, or how to debug hallucinations when things broke.
Alongside the tooling, I spent time talking directly with founders building in this space. Those conversations became just as informative as the products themselves.
Even when tools looked similar on the surface, the underlying visions were very different. Builders had conflicting ideas on who the primary user was, what MCPs should represent, and whether this layer was fundamentally about infrastructure, product creation, or distribution.
You could feel that divergence in how each platform behaved.
Where things broke
As I kept experimenting, patterns began to repeat.
Hallucinations showed up in different forms — including code that looked convincing but simply didn’t hold up. Some abstractions accelerated early progress, only to break under real constraints. Context loss meant restating intent again and again.
In several cases, MCPs appeared to work inside builder environments but failed to surface at all inside ChatGPT. In others, the system repeatedly reported success — changes “applied,” actions “completed” — while nothing had actually changed. Debugging became conversational: repeatedly prompting the system to self-diagnose. When that failed, progress depended on reaching out directly to founders to investigate issues that weren’t visible from the surface.

These issues aren’t unique to any one tool. They’re well documented across current LLM platforms and early-stage developer tooling, particularly when systems rely heavily on generative output without strong validation layers.
The harder part was understanding where responsibility sat.
When behavior deviated from expectations, it wasn’t obvious whether the issue came from the platform, the MCP configuration, or the tool itself. There was very little support for reasoning across those layers.
I also realized that I had more influence than I initially thought — over ChatGPT-native layout, conversational flow, and follow-up questions — but that influence was rarely surfaced clearly. It lived behind structured inputs and assumptions that were easy to miss if you didn’t already know they existed. That realization forced me to get more deliberate about what I was actually optimizing for.
How I learned to choose
Facing this level of fragmentation and instability, I needed a way to evaluate what actually mattered.
I stopped optimizing for comprehensiveness and started optimizing for learning velocity. That shift required being explicit about tradeoffs — what I was willing to sacrifice and what I wasn’t.
The framework that emerged:
Speed to behavioral validation over perfect infrastructure — I cared more about seeing how something behaved in ChatGPT than building it “correctly.” Mock data, hardcoded responses, simplified flows — whatever got me to a testable embedded interaction fastest.

Conversational coherence over feature completeness — A narrow interaction that felt natural mattered more than a wide feature set that felt mechanical. I’d rather ship one well-orchestrated flow than ten that worked but felt bolted on.
Debuggability over abstraction elegance — When something broke, I needed to understand why. Tools that hid complexity behind beautiful abstractions became liabilities. I favored visibility, even if it meant more manual work.
This hierarchy shaped everything: which tools I abandoned, which compromises I accepted, and how I structured the interaction model itself.
Patterns that emerged across tools and conversations
Looking across tools and conversations, a few themes stood out.
Guidance was mostly under-designed. Many platforms offered powerful capabilities but assumed a level of prompt literacy and architectural intuition that designers don’t naturally start with — and that the tools themselves didn’t teach.
“No-code” didn’t remove the need for systems thinking. It redistributed it. I found myself reasoning about flows, tool invocation, compliance, system architecture and boundaries, even without writing production code.

The biggest source of friction was translation. Enterprise assets — APIs, design systems, UI kits, brand and photography guidelines — aren’t LLM-ready by default. Converting them into something that could be safely and predictably invoked required manual judgment at nearly every step.
ChatGPT UI guidelines added another layer of ambiguity. It wasn’t always clear what was disallowed, what was risky, and what simply needed adaptation. Validating those decisions ahead of review remained difficult, even when an app appeared to work. That ambiguity isn’t accidental — it reflects a space where standards, responsibilities, and even product definitions are still being negotiated.

At the same time, it became obvious that the space hasn’t converged yet. Founders are building toward different futures, and that lack of alignment shows up directly in the products.
What the workflow revealed
Over time, I stopped expecting the tools to define the experience for me.
Progress came from taking ownership of the interaction model: deciding which results to surface, how they should appear, and what kind of conversational path felt intentional rather than reactive. I worked with mock data instead of real APIs to focus on behavior before scale.
Visual control required additional effort. UI components & the data they should contain were thought through and designed separately, exported as structured assets, and reintroduced into the platforms to better align with each product’s unique requirements.

Testing always happened inside ChatGPT itself. Differences between builder environments and real platform behavior were common, and those gaps often surfaced the most insight.
What this workflow ultimately revealed was less about process and more about role. Building in this space required judgment across design, systems, and platform interpretation. The work sat somewhere between product design and infrastructure awareness, even without owning production code.
What this shift changes about who gets to build
This experience clarified something: the line between “product” and “technical” work is dissolving.
In an informal conversation I had with Noam Segal — AI Insights Lead at Figma — he framed it simply: “designers in this space have to agree to tinker, fail often, and share those failures in order to learn.” Even without owning production code, I had to understand how tools connect, how data flows, and how decisions propagate through an LLM-driven product. These aren’t optional skills for designers working in AI — they’re foundational.
Prompting felt less like creative expression and more like intent specification. Over time, it became clear that this skill will likely be absorbed into tooling rather than remain a standalone practice.
For designers, this shift is significant. The work moves away from static artifacts and toward shaping behavior in uncertain systems. Comfort with experimentation, ambiguity, and failure matters more than deep mastery of any single tool.
Finally, product leadership too, is shifting. Outcomes matter more than mechanics. Adaptability matters more than precedent. And direct visibility into how products behave in the world matters more than perfectly polished specs.
Open questions and forward-looking bets
This exploration surfaced more questions than it resolved — and that feels appropriate for a space still taking shape.
- Where does responsibility sit when hallucinations slip through?
- How much control will non-technical builders retain as governance tightens?
- How do teams meaningfully validate products when preview environments remain incomplete?
At the same time, a few patterns feel durable enough to bet on:
The translation layer will become its own discipline. Converting enterprise systems into LLM-legible interfaces isn’t a one-time technical task — it’s an ongoing design problem that requires new kinds of judgment.

Conversational coherence will matter more than feature count. Products that nail follow-up flows, contextual memory, and invocation timing will outlast those with more capabilities but clumsy orchestration.
This space is still forming. The constraints are shifting, and the standards are unsettled. Building while that uncertainty exists has become part of the work — and, increasingly, part of the role.
Field notes from building a ChatGPT app as a non-technical builder was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.