18 Dec, 2024
This is especially true when more than one entity or “tenant” is serviced by the same physical data model running on the same infrastructure. When discussing multi-tenancy in PostgreSQL, there are several architectural approaches for isolating data among tenants, each with its own advantages and drawbacks. The three main approaches are:
1. Schema per tenant
2. Database per tenant
3. PostgreSQL Row-Level Security (RLS)
In this model, each tenant is assigned its own schema within a single PostgreSQL database. Every schema contains identical tables, but the data is segregated for each tenant.
Pros:
Logical isolation: Each tenant’s data is logically isolated, which provides a clear boundary for data separation within the database.
Shared resources: Since all tenants share the same database instance, managing database connections and resource allocation is simplified.
Simplified schema management: Schema-level changes (like table modifications) can be applied globally across all tenants, making schema migrations easier to manage.
Efficient resource utilisation: As tenants share the same database infrastructure, you can efficiently allocate system resources like memory, CPU, and storage.
Cons:
Increasing complexity: As the number of tenants grows, managing multiple schemas can become complex. Schema migrations or upgrades, while simpler than managing multiple databases, can still be cumbersome.
Backup and recovery: While schemas offer separation, a mistake or misconfiguration could potentially affect all tenants. Performing individual schema backups or restores may require additional effort.
Scaling challenges: PostgreSQL has limits on how many schemas can be efficiently managed in a single instance. As the number of tenants increases, performance may degrade.
With this model, each tenant is assigned a completely separate PostgreSQL database. This approach provides the strongest form of data isolation.
Pros:
Complete isolation: Each tenant’s data is entirely isolated, reducing the risk of cross-tenant data leakage. This is the most robust form of isolation.
Per-tenant customisation: Since each tenant has its own database, configurations, indexes, or performance settings can be customised for individual tenant needs.
Simplified tenant-specific backups: You can back up and restore individual tenant databases without affecting others. This simplifies per-tenant data recovery in case of issues.
Cons:
Resource overhead: Each database consumes its own resources. As you add more tenants, the overhead of managing multiple databases increases significantly, putting strain on system resources like memory, storage, and CPU.
Management complexity: Handling many databases, especially for updates, versioning, and schema migrations, requires significant effort. Automation tools are often needed to manage this at scale.
Connection pooling limitations: PostgreSQL has a limit on the number of connections it can handle. As each tenant uses a separate database, the connection overhead can become a bottleneck in systems with many tenants.
In the row-level security (RLS) model, all tenants share the same tables within a single database, but their data is isolated at the row level. A column, often referred to as a "tenant ID," is added to each table to distinguish data between tenants, and PostgreSQL's RLS policies ensure that users can only access the rows corresponding to their tenant.
Pros:
Simplicity: All tenants share the same tables, making schema management much simpler. Schema changes, upgrades, and migrations only need to be performed once.
Resource efficiency: By sharing the same database and tables, resource usage is optimised. This is particularly useful in environments where scaling out infrastructure is costly.
Fine-grained security: RLS provides granular control over who can access what data. This ensures strict tenant data isolation without complex application logic.
Cons:
Security configuration: Setting up RLS policies requires attention to detail, especially in systems with complex queries or joins. However, once configured correctly, it provides a powerful layer of security that simplifies application logic.
Performance considerations: While there is a small amount of overhead due to RLS policies being enforced on each query, this impact is typically minimal for most use cases. For very large datasets, careful optimization (such as indexing) can mitigate performance concerns.
Scaling with large datasets: As the number of tenants grows, ensuring efficient query performance may require additional tuning, such as adding tenant-specific indexes. With appropriate database optimization strategies, however, these challenges can be managed effectively.
Schema per Tenant
Logical isolation, shared resources, simplified schema management
Increasing complexity, backup/ recovery challenges, scaling issues
Database per Tenant
Strong isolation, per-tenant customisation, tenant-specific backups
Resource overhead, management complexity, connection limits
Row-Level Security (RLS)
Simplicity, efficient resource use, fine-grained security
Requires careful security configuration, moderate performance impact, scaling considerations for larger datasets
This blog explores PostgreSQL's Row-Level Security, explaining how it works, its advantages, and how to implement it effectively in your applications.
Row-Level Security (RLS) is a security feature that allows database administrators to define which rows in a table users can access based on their role or characteristics. This enables fine-grained access control, determining what rows a user can SELECT, INSERT, UPDATE, or DELETE.
Traditional database access controls operate at the database, schema, or table level. However, modern applications often require more specific control. Here are scenarios where RLS is invaluable:
Multi-Tenant Applications: When multiple tenants share the same database, RLS ensures each tenant can access only their data.
Regulatory Compliance: Regulations like GDPR and HIPAA mandate strict access controls to sensitive data.
Row Ownership: In systems where users should access only the data they own (e.g., personal profiles or user-specific records), RLS offers a seamless solution.
It's crucial to emphasise that the default postgres database role should never be used for application purposes due to its extensive administrative privileges. This becomes especially critical when implementing Row-Level Security (RLS), as the postgres role has the BYPASSRLS attribute enabled by default, which cannot be changed. This attribute allows the role to bypass RLS entirely, rendering any fine-grained access control policies ineffective. The postgres role is intended for administrative tasks such as deploying database changes, managing utilities, and handling third-party applications. Using it within your application not only prevents the proper implementation of RLS but also increases the risk of unauthorised access or unintended modifications.
PostgreSQL implements RLS by attaching policies to tables, controlling which rows are visible to users and what actions they can perform. These policies are enforced automatically; when a query runs, PostgreSQL adds conditions to the query’s WHERE clause, filtering rows based on the defined security policies.
For example, a SELECT query for data in an RLS-enabled table will automatically apply the policy, ensuring only authorised rows are returned.
Let’s walk through how to enable and configure RLS on a sample table.
1. Enabling Row-Level Security
Let’s start by creating a simple employees table:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
department TEXT NOT NULL,
salary NUMERIC NOT NULL
);
Populate the table with sample data:
INSERT INTO employees (name, department, salary) VALUES
('Alice', 'Engineering', 90000),
('Bob', 'Engineering', 80000),
('Carol', 'HR', 70000),
('Dave', 'HR', 60000);
Enable RLS on the table:
ALTER TABLE employees ENABLE ROW LEVEL SECURITY;
2. Creating Policies
Next, define policies to control access. Suppose users should only see employees in their own department. Start by creating roles for each department:
CREATE ROLE engineering_role;
CREATE ROLE hr_role;
Assign users to these roles (replace user_engineer and user_hr with real usernames):
GRANT engineering_role TO user_engineer;
GRANT hr_role TO user_hr;
Create the policy:
CREATE POLICY department_policy ON employees
USING (department = current_setting('myapp. current_department'));
Here, current_setting('myapp. current_department') is a custom session parameter set per user.
3. Testing the Policies
To test, set the department for each session. For an engineering user:
SET myapp.current_department = 'Engineering';
Now, as user_engineer, run a SELECT query:
SELECT * FROM employees;
Output:
id | name | department | salary
----+-------+-------------+--------
1 | Alice | Engineering | 90000
2 | Bob | Engineering | 80000
Similarly, for an HR user:
SET myapp.current_department = 'HR';
SELECT * FROM employees;
Output:
id | name | department | salary
----+-------+------------+--------
3 | Carol | HR | 70000
4 | Dave | HR | 60000
Row Ownership
For applications where rows have individual owners, RLS can ensure users access only their data. Add an owner column to the table:
ALTER TABLE employees ADD COLUMN owner TEXT;
Update the owner field:
UPDATE employees SET owner = 'user_engineer' WHERE department = 'Engineering';
UPDATE employees SET owner = 'user_hr' WHERE department = 'HR';
Create the policy:
CREATE POLICY owner_policy ON employees
USING (owner = current_user);
Test the policy:
SET ROLE user_engineer;
SELECT * FROM employees;
Output:
id | name | department | salary
----+-------+-------------+--------
1 | Alice | Engineering | 90000
2 | Bob | Engineering | 80000
Similarly, for the HR role:
SET ROLE user_hr;
SELECT * FROM employees;
Output:
id | name | department | salary
----+-------+------------+--------
3 | Carol | HR | 70000
4 | Dave | HR | 60000
And for an unprivileged user:
SET ROLE unprivileged_user;
SELECT * FROM employees;
Output:
No rows returned
RLS is particularly effective for isolating data in multi-tenant applications. For example, if each row has a tenant_id, you can create policies like:
CREATE POLICY tenant_policy ON employees
USING (tenant_id = current_setting ('myapp.current_tenant')::INT);
Best Practices
Least Privilege: Grant only the minimum permissions necessary.
Thorough Testing: Always rigorously test RLS policies to ensure they work as intended.
Logging: Implement logging for audit trails and compliance.
Policy Documentation: Use clear, meaningful names for policies and maintain detailed documentation.
Potential Pitfalls
Performance Considerations: Complex RLS policies may introduce slight performance overhead, but with proper indexing and query tuning, this can be mitigated.
Application Logic: Ensure that the application logic doesn’t unintentionally bypass RLS controls.
Superuser Access: Keep in mind that superusers are exempt from RLS policies.
PostgreSQL’s Row-Level Security is a powerful tool for implementing fine-grained access control. Whether you're managing multi-tenant architectures or enforcing strict user-specific data access, RLS provides flexibility and control without over-complicating application logic.
By leveraging RLS, you can enhance security, ensure compliance, and simplify data management in your applications. However, it’s crucial to plan and test your security policies thoroughly to ensure robust protection of your data.