Graam AI

This brief describes how documents and data files uploaded to the Graam Harmony platform are stored, encrypted, and access-controlled. It is intended for security and compliance teams evaluating Graam for institutional use.

Executive summary

For institutional / hedge-fund deployments, Graam Harmony provides:

Per-tenant cryptographic isolation. Each customer's uploads live in a dedicated GCS bucket, encrypted with a Cloud KMS key the customer holds. Revoking the customer's grant on the key cryptographically locks Graam (and Google) out of the data — without the active grant, the encrypted bytes cannot be read.
Hardened bucket defaults. Every customer bucket is created with uniform IAM (no legacy ACLs), public-access prevention enforced, object versioning, and provenance labels — applied at creation and audit-friendly.
Verified-identity API. Every API call is authenticated with a signed JWT verified at the gateway. The user identity from the token, not any URL parameter, drives every access decision.
Database-layer ownership enforcement. Document and data-file reads carry the calling user's id into the SQL WHERE clause. A caller cannot retrieve another customer's row regardless of how the document UUID was obtained.
Comprehensive audit logging. Every API request, every agent-initiated document read, and every storage operation is logged with the verified caller identity, retained ≥ 90 days.

The remainder of this document explains each control and how a customer can verify it.

What is uploaded

When a user uploads to Graam Harmony, the bytes belong to one of three lanes:

Artifact	Examples	Storage
Documents	KBRA pre-sale PDFs, prospectuses, term sheets	GCS (PDF bytes) + Postgres (metadata)
Data files	Loan tapes (CSV/Excel), performance workbooks, parquet derivatives	GCS (raw + parquet) + Postgres (metadata)

Postgres holds metadata only — ownership column, upload timestamp, SHA-256 hash for de-duplication, soft-delete marker, and the GCS path. The customer-supplied bytes themselves live only in GCS.

Loan tapes carry the most sensitive content (loan-level borrower data) and receive the same controls as every other artifact — there is no separate "less protected" lane.

Storage isolation

Per-tenant bucket model

Graam Harmony supports two storage modes, selected by deployment configuration:

Shared bucket (default for non-institutional deployments). All user content lives in one operator-managed bucket, namespaced by object key (documents/<user_id>/<document_id>.<ext>). Cross-customer collisions are impossible because each customer's content is keyed under their UUID.

Per-tenant buckets (institutional / hedge-fund tier). Each customer's content lives in a dedicated GCS bucket — one bucket per user identity, named deterministically as <prefix><uuid_no_dashes> (e.g. graam-tenant-d9cf70a51d214ca58f7a98feecb8cbfa). System-ingested public data (EDGAR filings, etc.) stays in the shared system bucket and never mixes with customer content.

The institutional model is the posture asset managers expect: a single shared bucket has a single blast radius, while per-tenant buckets give each customer:

A separately-grantable IAM scope.
An optional dedicated Cloud KMS key (see Encryption below).
A separately-configurable retention / lifecycle policy.
An audit-log destination filterable to that customer alone.

Hardened defaults at bucket creation

When the platform creates a customer bucket on first upload, it applies a fixed security policy that cannot be silently weakened:

Uniform bucket-level access enforced. IAM is the only grant path. Per-object ACLs are impossible. An audit of "who has access to this bucket" needs to inspect exactly one place: the IAM policy.
Public-access prevention enforced. The bucket cannot be granted to allUsers or allAuthenticatedUsers, even by an operator with high IAM permissions. This is the bucket-level safety net against the failure mode where a misconfigured grant exposes data to the open internet.
Object versioning enabled. Accidental delete and overwrite are recoverable from version history; combined with a lifecycle rule, this provides a window for ransomware recovery.
Residency region pinned. Customers in regulated jurisdictions can specify the GCS region per bucket. Default is us-central1; EU and other GCP regions are supported.
Provenance labels. Every Graam-provisioned bucket carries identifying labels so cost-allocation and audit queries can partition reliably.

Customers who require specific retention-lock policies (e.g. 7-year hold for ABS) pre-create their bucket with the lock applied. The platform refuses to silently create a bucket without the required lock when configured to defer to operator provisioning.

Encryption

At rest in GCS

All bytes are encrypted at rest by GCS with AES-256. For institutional deployments, Graam Harmony binds each per-tenant bucket to a Customer-Managed Encryption Key (CMEK) in Cloud KMS at bucket creation time. Three configurations are supported:

Mode	Use case
Google-managed keys (GMEK) — default for non-institutional deployments.	AES-256 at rest with Google holding the keys.
Single shared customer key.	One Cloud KMS key for the deployment. Operationally simple — one key to rotate, one IAM grant. Appropriate for customer-dedicated Graam instances.
Per-tenant customer key.	A separate Cloud KMS key per customer. Each customer's bucket binds to their own key. Cryptographic isolation between customers. Revoking one customer's key has no effect on others.

In the customer-key modes, the customer (not Graam) creates the KMS key in their own keyring. The customer grants the GCS service account roles/cloudkms.cryptoKeyEncrypterDecrypter on the project. The Graam application's service account does not make this grant on the customer's behalf — that would defeat the entire CMEK threat model. Revoking the grant cryptographically locks Graam (and Google) out of the data, without requiring any code change or deployment action.

At rest in Postgres

The metadata database is hosted on Google Cloud SQL with at-rest encryption enabled. Postgres holds metadata only — file ownership, hashes, paths — never the customer-uploaded bytes.

In transit

TLS 1.2 or higher between every link: client ↔ API, API ↔ GCS, API ↔ Postgres. Plaintext HTTP is not accepted.

Identity and access control

Authentication

Every API request authenticates with a JSON Web Token (JWT) signed by a customer-controlled secret or asymmetric key. The API verifies:

The JWT signature (HS256 / RS256 / ES256 supported).
The exp (expiration) claim.
Optionally the iss (issuer) and aud (audience) claims.

A malformed, expired, or wrong-signature token returns 401 Unauthorized. There is no silent fallback — an attacker forging a token gets rejected, not downgraded.

The verified user identity is read from the sub claim and becomes the binding for every downstream access decision in that request.

Customers operating their own JWT-issuing identity provider can plug in their public key (JWKS) and have Graam reject any token not minted by their IdP.

Authorization — every read/write/delete

For every access of a customer artifact, the platform:

Resolves the calling identity from the verified JWT.
Looks up the artifact by ID.
Reads the stored owner column.
Compares it to the calling identity.
Returns 403 Forbidden on mismatch.
Returns 404 Not Found when the artifact doesn't exist or is owned by a different customer — we do not leak the existence of another customer's UUID by returning a different status code.

A spoofed ?user_id=victim URL parameter is irrelevant; the verified JWT is authoritative.

Database-layer ownership filter

Application-layer authorization checks have a known failure mode: an internal code path can be written that forgets to perform the check. Graam Harmony eliminates this risk by enforcing ownership at the SQL layer.

The data-access methods that read documents and data files require the calling user's id and apply it as a WHERE clause filter. A row whose owner does not match the calling identity is invisible to the query — no row is returned, regardless of how the artifact UUID was obtained.

This means an internal code path that mishandles user identity gets back zero rows, not the wrong customer's data.

A small, audit-tagged set of system pipelines (post-upload parsing, batch enrichment) operate without a calling user. These call explicitly-named bypass methods that produce a unique audit signal — a security review can locate every bypass in one search and review the justifying comment at each site. These bypass methods are never reachable from an API request path.

Agent / LLM read paths

When the customer runs an analysis cell, the in-platform agent may need to read content from a document the customer has explicitly attached to that cell. The agent runs server-side under the calling customer's identity; its document reads flow through the same SQL ownership filter described above.

If a malicious prompt tells the agent to "read document XYZ" where XYZ is some other customer's document, the agent receives "Document not found" — the row is never returned to the agent's session. Cross-customer read via prompt injection is structurally impossible.

Notebook sharing

A customer may publish a specific notebook with a shareable link. Shared notebooks expose the notebook content (cells, prose, charts) but never the underlying source documents. Document downloads always require ownership.

Rate limiting

Upload endpoints enforce per-IP rate limits backed by Redis (so the limit is consistent across worker processes). Limits are configurable per deployment.

Audit logging

Every action against customer data is logged with a stable trail:

API requests — calling user, request path, status, timing, request UUID. Retained ≥ 90 days.
Agent operations — every agent tool invocation (file read, document parse, download) is logged in ag.workflow_events keyed to the customer's session context.
Storage operations — GCS bucket-level access logs are available on request; for per-tenant deployments these can be routed to a customer-controlled log destination.

The verified caller identity is bound to every event, so a post-incident timeline can be reconstructed against a specific customer or a specific request.

File-content validation

Size caps. Documents 100 MB; data files / unified uploads 200 MB. Configurable per deployment.
Zip handling. Recursive archives and macOS metadata (__MACOSX/) are rejected. Per-entry size is checked against the declared size.
Magic-byte / virus scanning. On the near-term roadmap (see below). Currently the platform validates the file extension and size; magic-byte verification and ClamAV scanning land in the next hardening pass.

Retention and deletion

Soft delete is the default — deleting a document or data file marks the row deleted and hides it from every read path. Hard deletion (purging the bytes from GCS) is currently a manual operations procedure; automated hard-delete after a configurable retention window is on the near-term roadmap.

For institutional deployments, customers can specify their own retention policy on the per-tenant bucket via GCS lifecycle rules or a retention lock — these apply to the bytes regardless of the application's logical state.

Defense in depth

Two independent layers guard customer data today:

Verified-identity gateway. Every authenticated API path resolves the caller's identity from a verified JWT. No URL parameter overrides this.
Database-layer ownership filter. Data-access methods include the calling identity in the SQL WHERE clause; an internal path that mishandles identity returns no row.

A planned third layer — Postgres row-level security with a per-request SET LOCAL of the user identity — extends ownership enforcement to the database role itself. After this lands, even a hypothetical application bug that forgets to thread identity cannot return another customer's row; the database refuses.

Roadmap (transparency)

We list the work that is in flight rather than implying it is done:

Streaming zip decompression with a hard cap during decompression.
Magic-byte file-type validation and integrated ClamAV scanning.
UUID validation guard on storage paths.
Automated hard-delete after retention window.
Postgres row-level security for the artifact tables.
Sub-agent isolation for document reads (additional defense against prompt-injection attacks; today the database-layer filter already prevents cross-customer reads via prompt injection).

Compliance posture

SOC 2 / ISO 27001. Not yet certified. We can share our controls inventory and gap analysis on request, and accept customer security questionnaires answered against the live current state — not aspirational answers.
Data residency. GCP us-central1 by default; EU regions and other GCP regions on request. CMEK keys are customer-controlled in the customer's own GCP project / KMS keyring.
Customer audits. Welcome under NDA. We support read-only reviews of the relevant code paths and live verification of the controls described in this document.

Verification — what a customer can ask us to demonstrate

The KMS key bound to the customer's bucket (gsutil kms info gs://<bucket>).
The bucket's IAM policy and uniform-access / public-access prevention settings (gsutil bucket get-iam-policy, gsutil bucketpolicyonly get).
A signed JWT from the customer's IdP being accepted, and an unsigned token being rejected, on the same endpoint.
A cross-customer document UUID returning 404 (not 403) when requested by a different identity.
The audit-log entry for any of the above probes.

Contact

Security questions, vulnerability reports, or audit requests: [email protected].

We respond within 48 hours and treat reports under coordinated disclosure: 90 days from acknowledgment to public disclosure unless the reporter requests otherwise.

Graam Harmony — Customer Data Security Brief