Senior AI Agent Engineer Job
<h2><strong>About the Role</strong></h2><p style="min-height:1.5em">Join Planera to build Manny, our AI scheduling assistant, and shape how construction schedulers work with AI on a modern Critical Path Method platform. You will own agent features end to end: designing and evolving the LangGraph/LangChain agent, engineering prompts and tools, integrating LLMs across providers, and holding response quality to a high bar with a real evaluation and observability stack. This is a hands-on applied AI role with a strong software engineering foundation and a focus on reliability, behavior quality, and user impact. You will work directly with the CTO and the lead AI engineer.</p><p style="min-height:1.5em"></p><h2><strong>Key Responsibilities</strong></h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Design, build, and own Manny features end to end across the agent backend, tools, and UI</p></li><li><p style="min-height:1.5em">Improve agent behavior, reliability, and answer quality through prompt engineering, tool design, and changes to the agent control flow</p></li><li><p style="min-height:1.5em">Evolve the agent architecture: ReAct loop, routing and controller logic, multi-node graphs, tool selection, and streaming responses</p></li><li><p style="min-height:1.5em">Integrate and tune LLMs across providers (Anthropic, OpenAI, Google), balancing quality, latency, and cost, including prompt caching and model selection</p></li><li><p style="min-height:1.5em">Design and extend Manny's tool surface through the MCP server that connects the agent to Planera's scheduling services</p></li><li><p style="min-height:1.5em">Build and own the evaluation loop: golden datasets, automated evaluators, snapshot-based replay, and offline and online quality metrics</p></li><li><p style="min-height:1.5em">Implement observability for agent runs with tracing, metrics, and structured logging, and use it to debug and improve behavior in production</p></li><li><p style="min-height:1.5em">Ensure safe, sandboxed execution of model-generated code and safe handling of tool side effects and mutations</p></li><li><p style="min-height:1.5em">Collaborate with product, backend, and frontend to deliver AI features end to end</p></li></ul><h2><strong>Requirements</strong></h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">4+ years of software engineering experience, including recent hands-on work building production LLM features.</p></li><li><p style="min-height:1.5em">Strong proficiency in Python building production services</p></li><li><p style="min-height:1.5em">Hands-on experience building agentic systems with LLMs: tool and function calling, ReAct or similar loops, and orchestration frameworks such as LangChain/LangGraph</p></li><li><p style="min-height:1.5em">Practical prompt engineering skill: shaping model behavior reliably, debugging failures from traces, and managing large prompts and token cost</p></li><li><p style="min-height:1.5em">Experience evaluating LLM systems: building datasets, writing evaluators, catching regressions, and using tracing and observability tooling</p></li><li><p style="min-height:1.5em">Experience with the Model Context Protocol (MCP) or building tool and function-calling integrations for LLMs</p></li><li><p style="min-height:1.5em">Solid understanding of API design (REST, websockets, SSE and streaming) and interservice communication</p></li><li><p style="min-height:1.5em">Product mindset with a focus on user impact and pragmatic tradeoffs</p></li><li><p style="min-height:1.5em">Excellent remote communication skills</p></li></ul><h2><strong>Preferred</strong></h2><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Experience with MongoDB and Redis</p></li><li><p style="min-height:1.5em">Cloud experience (AWS or GCP), containers, and CI/CD</p></li><li><p style="min-height:1.5em">Go experience, as most of our backend systems are written in Go, including the MCP tool server</p></li><li><p style="min-height:1.5em">Practical experience with retrieval and augmentation (RAG), embeddings, and vector stores</p></li><li><p style="min-height:1.5em">Familiarity with LangSmith or comparable LLM evaluation and tracing platforms</p></li><li><p style="min-height:1.5em">Frontend or React familiarity for agent UI work</p></li><li><p style="min-height:1.5em">Domain knowledge in construction tech, project management, or scheduling</p></li></ul><h2><strong>Tech Stack</strong></h2><h2>Python (Flask), Go, LangGraph/LangChain, LangSmith, MongoDB, Redis, S3, REST/websockets/SSE, Docker, AWS/GCP, Terraform, GitLab CI/CD</h2><p style="min-height:1.5em"></p><h2><strong>Why Join Us</strong></h2><p style="min-height:1.5em"><strong>Impact:</strong> Be at the forefront of transforming a $12.1 trillion industry. Build the AI that changes how the world plans and schedules construction.</p><p style="min-height:1.5em"><strong>Culture:</strong> Join a smart, spirited team dedicated to innovation and excellence.</p><p style="min-height:1.5em"><strong>Growth:</strong> Opportunity for professional growth and career advancement in a fast-paced start-up environment.</p><p style="min-height:1.5em"></p><h2><strong>Benefits</strong></h2><p style="min-height:1.5em">Competitive salary, stock options, benefits package, and a dynamic work environment.</p>