Go Deeper
A complete AI platform that captures every question, validates every answer, then takes it to first delivery. Four-phase delivery: 1) Guided streamlit intake collects source systems, requirements, SLAs, stakeholders, DQ rules, and PII strategy — stored in Databricks Unity Catalog or Posgres with full audit trail. 2) Validation: AI agents assess completeness, flag gaps, and engage stakeholders before a single line of DDL is written 3) Architect: Automated architecture design grounded in captured requirements — source-to-target mappings, medallion layers, partition strategies
4) 8-agent pipeline generates DDL scripts, pytest suites, SDLC artifacts, and documentation — first delivery, ready for iteration
Stack Used
Based on the provided documentation and architecture guides, NiData is built on a dual-deployment architecture (Databricks and Docker) that shares a core foundation.
Here is the full tech stack broken down by component:
**Core Languages & Frontend**
* **Python (3.8+):** The primary language used across the platform, driving everything from agent orchestration to the knowledge graph.
* **SQL:** Used for both Unity Catalog DDL (Delta Lake) and standard PostgreSQL schemas.
* **Streamlit:** Powers the main 9-step wizard web UI, the artifact viewer, and the admin panel for reference data management.
**AI Models & Orchestration Engine**
* **LLMs:** The default model is Llama 3.3 70B (hosted via Databricks Model Serving). The platform can also be configured to use OpenAI GPT-4 and Anthropic Claude via their APIs.
* **Agent Orchestration:** A Python-based, config-driven engine using LangGraph-style orchestration to manage the 8-agent legacy pipeline and the 7-agent sequential delivery pipeline.
* **Job Orchestration:** Can run on Databricks Workflows or Apache Airflow.
**Deployment Option A: Databricks (Enterprise Cloud-Native)**
* **Storage & Governance:** Unity Catalog (26-table schema) and Delta Lake.
* **Compute:** Databricks Runtime / Spark.
* **Model Registry & Feature Store:** MLflow on Databricks and Databricks Feature Store.
* **Security:** OIDC managed authentication and Databricks secrets.
**Deployment Option B: Docker (Standalone / Air-Gapped)**
* **Containerization:** Multi-service Docker Compose orchestration.
* **Database:** PostgreSQL (for platform-agnostic storage) connected via `psycopg2`.
* **Infrastructure as Code (IaC):** Terraform used to provision cost-optimized GCP spot instances.
* **Web Server:** Nginx acting as a reverse proxy for production profiles.
**Business Intelligence (BI) Integration**
* **Tableau Parsers & Connectors:** Parses TWB/TWBX files, integrates via the Tableau Server REST API, and connects directly to the internal Tableau PostgreSQL repository on port 8060.
* **Power BI Parsers & Connectors:** Extracts DAX measures and connects to the Power BI XMLA endpoint and Azure SQL.
**Knowledge Graph & Context Tools**
* **Business-Domain Graph:** Custom Python implementation (`agent_knowledge_graph.py` and `graph_query_layer.py`) with parallel internal indexing for lineage tracking and multi-hop impact analysis.
* **Code-Structure Graph:** `CodeGraphContext` (cgc) and `Kuzu` are used in the developer tooling to map function calls, class hierarchies, and module dependencies.
**CI/CD, Testing & Integrations**
* **CI/CD:** GitHub Actions.
* **Testing:** `pytest` is used both for testing the platform itself and for generating automated data quality and Gold-layer reconciliation tests as an artifact of the agent pipeline.
* **Notifications:** Microsoft Teams webhooks for automated stakeholder sign-offs and notifications.