Local AI in Rails: Fast, Cheap, and Predictable
Sentiment Omikuji - A Case Study in Local NLP
Rails apps usually reach for external AI services when they need “intelligence,” but that approach can bring latency, cost, and unpredictability along for the ride. This project shows a different path: running a Japanese sentiment model directly inside a Rails app, using ONNX Runtime, background jobs, and real-time updates to keep the user experience fast and responsive.
Using a small app called Sentiment Omikuji as the case study, this work walks through how the model was trained in Python, exported to ONNX, and loaded into Ruby for local inference. It also covers how Japanese text is tokenized, how the sentiment output drives a fortune generator, and why the app uses Solid Queue to keep model work off the request path.
Along the way, we explore the tradeoffs of local inference versus API-based LLMs, and why classic machine learning can still be the right tool for focused problems. You'll discover a practical architecture you can adapt in your own Rails applications, plus a clearer picture of when local AI makes sense, how to keep it predictable, and what it takes to make it feel good in a real app.
The Ear
BERT Analysis via fine-tuned bert-base-japanese-v3 model running natively in Ruby via ONNX.
The Voice
Markov Service generative engine performing morphological n-gram prediction for unique Japanese fortunes.
The Shrine
Solid Stack (Queue + Cable) for asynchronous inference and real-time UI updates via Turbo Streams.
AI doesn't have to mean an LLM