BizHonyaku
Back to blog

Cloud AI translation security — three real risks and three workflows for confidential documents

8 min read

Cloud-based AI translation (DeepL, Google, ChatGPT) is fast and cheap — but legal, HR, and finance teams keep asking the same question: is it safe to send confidential documents to a third-party service? This post breaks down the actual risks, what to ask vendors, and the three workflows enterprises use to manage them.

Three real risks with cloud AI translation

1. Training-data use. Free tiers often reserve the right to use your input as training data. A confidential clause could resurface in another customer's output — rare, but documented. Always use a paid plan that explicitly disclaims training use.

2. Retention. Most services keep translation logs for 30–90 days on free plans. Enterprise plans push that down to zero or a few days.

3. Sub-processors. Translation tools often rely on OpenAI, Anthropic, or Google models behind the scenes. Your data moves through two parties, not one. Each contract layer matters.

Vendor checklist

  1. "Will not be used for training" — in the contract, not just a blog post
  2. Retention window with a deletion guarantee
  3. TLS in transit, AES-256 at rest
  4. Data centre region (does it matter for your jurisdiction?)
  5. SOC 2 / ISO 27001 / equivalent audit
  6. Access logs and audit trail
  7. User-initiated deletion (GDPR / APPI)

Three workflows that actually work

Mask before send. Replace names, numbers, and contract specifics with placeholders before submitting. Reverse after translation. Safest, most tedious.

Enterprise plan with the right contract. Most teams land here. Pick a vendor where no-training + short retention + encryption are contractual. Practical and acceptable for nearly every use case below "state secret".

On-prem / private cloud. Self-host an LLM in your own Azure or AWS account. Maximum control, but quality lags and operational cost is significant. Defense, healthcare, finance worth-it-cases only.

How BizHonyaku handles this

  • Submitted documents are never used to train AI models — contractually
  • Translation outputs are deleted within 24 hours; access logs anonymised after 7 days
  • TLS 1.3 in transit, AES-256 at rest
  • Tokyo region data centre
  • Audit logs available to your internal compliance team on request

Cloud AI translation isn't "safe" or "unsafe" — it's a per-vendor question. Free tools on confidential work is obviously off the table. The right enterprise plan is, in practice, safer than most teams' existing email workflows.