chore(docs): create architecture page
SUMMARY
- Adds an "architecture" page summarizing the components of a Superset installation. A visual diagram should eventually go here too. Please feel free to edit/add/cut.
- Reorders other installation pages to follow this one
- Copyediting of Configuring Superset page
This is ready for review.
hey! - I'm hoping it doesn't come across as a hostile takeover of this PR, but I threw your pages into GPT and added some comments/input in the form of bullet, and this is what came out:
Architecture
This documentation outlines the architecture of Apache Superset, with a primary focus on the backend components before introducing the frontend. This organization provides a thorough understanding of how Superset operates from data handling to user interface.
Superset Backend
The backend of Superset is composed of several critical components designed to manage data, execute tasks, and maintain the overall functionality of the system.
Core Backend Components
Web Application [Python/Flask]:
- Description: Serves static assets and handles API requests from the Superset frontend.
- Role: Acts as the primary communication hub for frontend interactions and immediate query processing.
Metadata Database [Required]:
- Description: Stores all of Superset’s essential assets such as dashboards, charts, user configurations, and logs.
- Supported Technologies: PostgreSQL or MySQL (recommended), other SQLAlchemy-supported OLTP databases
Optional Backend Components
Asynchronous Workers [Pytjhon/Celery]:
- Description: Manages tasks that are too long or intense for a typical web request cycle.
- Enabled Features:
- Asynchronous query executions in SQL Lab.
- Scheduled generation of alerts and reports.
- Creation of dashboard thumbnails.
- Dependencies: Requires a message queue (e.g., Redis, RabbitMQ).
Caching Layer:
- Description: Enhances performance by caching query results and frequently accessed data.
- Technologies: Primarily Redis, with support for other caching systems.
- Enabled Features:
- Accelerated access to repeated queries.
- Improved responsiveness of the application.
Logging Interfaces
- Standard Output and Error Logs: Essential for debugging and monitoring application health.
- StatsD/Metrics Collection: Enables real-time aggregation of performance metrics.
- Analytics Logging: Rich structured logs that provides insights into user behaviors and application usage patterns. Typically sent to a stream to land into a data warehouse
Other Common Infrastructure components:
- Load Balancers/API Gateway: Distributes incoming traffic across multiple servers to enhance availability and manage traffic peaks.
- Observability/Alerting: Provides monitoring, error tracking, and real-time alerts to maintain performance and uptime.
- WSGI Server (e.g., Gunicorn in async mode):
- Database Drivers: Enables communication between Superset and its databases, crucial for operational data querying and management.
- Orchestration (e.g., Kubernetes): Automates deployment, scaling, and management of containerized applications, ensuring robust service availability.
- Additional Security Measures: Implements network security, data encryption, and access controls to safeguard data and comply with regulations.
Superset Frontend
The frontend of Superset is a sophisticated web client built using modern web technologies to facilitate interactive data visualization.
Core Technologies:
- React: Forms the foundation of the frontend, offering a responsive and dynamic user interface.
- antd: Utilized for designing the visual components and layout of the UI, providing a consistent and professional aesthetic.
- Plugin Architecture: Allows for the extension of visualization capabilities through custom plugins, enhancing the flexibility and functionality of visual data representation.
Functionality:
- Communicates with the backend via the Superset API, enabling users to manage and visualize data efficiently.
- Supports extensive customization and extension through community-developed plugins and themes.
Github needs the "face melt" emoji as a reaction.
Is it possible to "land and expand" here? My trust issues with GPT aside (did it really say "Pytjhon?") I wonder if we can merge a first iteration, then divide/expand/remove/elaborate as needed from there, rather than go deep here. We can feed it to GPT as we go for consolidation/clarification/organization.
I did a fair amount of prompt-inputs and edits to get to that, but main thing is the structure of the docs (backend/frontend) and mentioning technologies used (and technology choices) in different areas
From here, we can update the installation page to provide a little table (or similar) about how (or if) each installation method installs these components.