Human Archive Data Platform

Built Human Archive's (YC W26) enterprise data platform — TB-scale robotics dataset delivery with Cognito multi-tenant auth, S3 signed URLs, and recursive folder resolution across AWS.

By Aditya Singh Khichi, Full Stack Engineer, New Delhi, India.

Tech stack: React, TanStack Router, Express, PostgreSQL, AWS.

datasets served: TB-scale

Problem

Human Archive (YC W26) needed an enterprise platform to deliver multimodal robotics datasets at terabyte scale. Customers wanted role-gated access to specific dataset slices, recursive S3 folder structure preserved end-to-end, and auth that supported multiple organizations without leaking data across tenants. Off-the-shelf data portals couldn't handle the storage scale or the multi-tenant boundary.

Approach

Led the platform end-to-end: React with TanStack Router for the file-tree UI, an Express API, Postgres for metadata, and a deep AWS integration (Cognito for auth, S3 for storage, Lambda for orchestration, CloudFront with signed URLs for delivery). Multi-tenant auth verifies Cognito JWTs server-side and enriches them with role and org profiles from Postgres — the role then branches entire component subtrees on the frontend, so an enterprise viewer never even sees the contributor surface area. The dataset pipeline does recursive S3 folder resolution and batched ingestion with conflict-safe upserts.

Outcome

The primary tool for delivering TB-scale datasets to enterprise customers. Recursive folder resolution survived the move to multi-region S3. Role-gated UI branching kept the contributor and consumer codepaths from cross-contaminating, which would have been the inevitable failure mode if I had used a feature-flag-per-button approach.

Live link: https://humanarchive.ai