ARKONE

Newest Questions

Ask
How do you choose an embedding model for a domain-specific corpus?

We're building retrieval over clinical notes and medical literature. General-purpose embedding models (OpenAI, Cohere) perform worse than we expected on domain-specific terminology

embeddingsrageval5d ago
How do we evaluate RAG accuracy without labelled data?

We're building a document QA system over internal financial reports. We don't have labelled question-answer pairs and building a ground-truth dataset would take months. How are tea

ragevalproduction7d ago
Fine-tuning vs. RAG — when does fine-tuning actually win?

We keep reading that RAG is the right default and fine-tuning is for style/format, not knowledge. But we've had cases where a fine-tuned model on domain-specific data outperformed

fine-tuningragllm-ops9d ago
How do you make LLM agents reliable enough for production?

We've built a multi-step agent for contract review that works well in demos but fails unpredictably in production — wrong tool calls, missed steps, hallucinated outputs. What does

agentsdeploymentproduction13d ago
What does prompt versioning look like in a team of 10+ engineers?

We've outgrown ad-hoc prompt editing in code. Engineers are stepping on each other's changes, we have no audit trail, and we can't run A/B tests on prompt variants systematically.

prompt-engineeringllm-opsdeployment15d ago

Join the practitioners answering these.

Apply to the network or talk to us about placing AI practitioners at your company.

How do you control LLM inference costs when usage scales 10x?

We launched an AI feature that's getting heavy adoption. Inference costs have gone from predictable to alarming. We've looked at caching, smaller models for classification steps, a

llm-opsdeploymentproduction17d ago

The practitioners behind the work.

Join the practitioners answering these questions — or access the network to hire them.