BonsaiDb User's Guide
BonsaiDb is an ACID-compliant, document-database written in Rust. Its goal is to be a general-purpose database that aims to simplify development and deployment by providing reliable building blocks that are lightweight enough for hobby projects running with minimal resources, but scalable for when your hobby project becomes a deployed product.
This user's guide aims to provide a guided walkthrough for users to understand how BonsaiDb works. This guide is meant to be supplemental to the documentation. If you learn best by exploring examples, many are available in /examples
in the repository. If, however, you learn best by taking a guided tour of how something works, this guide is specifically for you.
If you have any feedback on this guide, please file an issue, and we will try to address any issues or shortcomings.
Thank you for exploring BonsaiDb.
About dev.bonsaidb.io
The domain that is hosting this user guide is powered by Dossier. Dossier is a static file hosting project that is powered by BonsaiDb's file storage features, currently served on a Stardust instance in Amsterdam at Scaleway. Every page/image/script is loaded from BonsaiDb (although the domain has caching by Cloudflare).
Concepts
This is a list of common concepts that will be used throughout this book as well as the documentation.
Document
A Document is a single piece of stored data. Each document is stored within a Collection
, and has a unique ID within that Collection. There are two document types: OwnedDocument
and BorrowedDocument
. The View::map()
function takes a BorrowedDocument
, but nearly every other API utilizes OwnedDocument
.
When a document is updated, BonsaiDb will check that the revision information passed matches the currently stored information. If not, a conflict error will be returned. This simple check ensures that if two writers try to update the document simultaneously, one will succeed and the other will receive an error.
Serializable Collections
BonsaiDb provides the SerializedCollection
trait, which allows automatic serialization and deserialization in many sitautions. When using SerializedCollection::document_contents()
function, the document is serialized and deserialized by the format returned from SerializedCollection::format()
.
The CollectionDocument<T>
type provides convenience methods of interacting with serializable documents.
Default serialization of Serde-compatible types
BonsaiDb provides a convenience trait for Serde-compatible data types: DefaultSerialization
. This empty trait can be implemented on any collection to have BonsaiDb provide its preferred serialization format, Pot.
Raw Collections
If you would prefer to manually manage the data stored inside of a Document, you can directly manage the contents
field. BonsaiDb will not interact with the contents
of a Document. Only code that you write will parse or update the stored data.
Collection
A Collection is a group of Documents and associated functionality. Collections are stored on-disk using ACID-compliant, transactional storage, ensuring your data is protected in the event of a sudden power failure or other unfortunate event.
The goal of a Collection is to encapsulate the logic for a set of data in such a way that Collections could be designed to be shared and reused in multiple Schemas or applications.
Each Collection must have a unique
CollectionName
.
To help prevent naming collisions, an authority
can be specified which
provides a level of namespacing.
A Collection can contain one or more Views.
Primary Keys
All documents stored in a collection have a unique id. Primary keys in BonsaiDb are immutable -- once a document has an id, it cannot be changed. If you wish for a unique key that can be updated, use a unique view, and use a separate value as a primary key.
The type is controlled by the Collection::PrimaryKey
associated
type. If you're using the derive macro, the type can be specified
using the primary_key
parameter as in this example:
#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "multi-key", primary_key = AssociatedProfileKey)]
struct AssociatedProfileData {
value: String,
}
#[derive(Key, Debug, Clone, Copy, Eq, PartialEq, Ord, PartialOrd)]
struct AssociatedProfileKey {
pub user_id: u32,
pub data_id: u64,
}
If no primary_key
is specified in the derive, u64
will be used.
Inserting and accessing the collection can be done using the newly defined primary key type:
let key = AssociatedProfileKey {
user_id: user.header.id,
data_id: 64,
};
let inserted = AssociatedProfileData {
value: String::from("hello"),
}
.insert_into(&key, &db)?;
let retrieved = AssociatedProfileData::get(&key, &db)?.expect("document not found");
assert_eq!(inserted, retrieved);
Natural Ids
It's not uncommon to need to store data in a database that has an "external" identifier. Some examples could be externally authenticated user profiles, social networking site posts, or for normalizing a single type's fields across multiple Collections. These types of values are often called "Natural Keys" or "Natural Identifiers".
SerializedCollection::natural_id()
or
DefaultSerialzation::natural_id
can be implemented to return
a value from the contents of a new document. When using the derive marco, the
natural_id
parameter can be specified with either a closure or a path to a
function with the same signature.
In this example, the UserProfile
type is used to represent a user that has a
unique ID in an external database:
#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "user-profiles", primary_key = u32)]
struct UserProfile {
#[natural_id]
pub external_id: u32,
pub name: String,
}
When pushing a UserProfile
into the collection, the id will automatically be
assigned by calling natural_id()
:
let user = UserProfile {
external_id: 42,
name: String::from("ecton"),
}
.push_into(&db)?;
let retrieved_from_database = UserProfile::get(&42, &db)?.expect("document not found");
assert_eq!(user, retrieved_from_database);
Custom Primary Keys
All primary keys must implement the Key
trait . BonsaiDb provides implementations for many types, but any type that implements the trait can be used.
When using push
/push_into
, BonsaiDb needs to assign a unique ID to the incoming document. If natural_id()
returns None, the storage backend will handle id assignment.
If the document being pushed is the first document in the collection, Key::first_value()
is called and the resulting value is used as the document's id.
If the collection already has documents, the highest-ordered key is queried from
the collection. Key::next_value()
is then called and the resulting value is
used as the document's id. Key
implementors should not allow next_value()
to
return a value that is less than the current value. NextValueError::WouldWrap
should be returned instead of wrapping.
Both first_value()
and next_value()
by default return
NextValueError::Unimplemented
. If any error occurs while trying to assign a
unique id, the transaction will be aborted and rolled back.
View
A View is a map/reduce-powered method of quickly accessing information inside of a Collection. Each View can only belong to one Collection.
Views define two important associated types: a Key type and a Value type. You can think of these as the equivalent entries in a map/dictionary-like collection that supports more than one entry for each Key. The Key is used to filter the View's results, and the Value is used by your application or the reduce()
function.
Views are a powerful, yet abstract concept. Let's look at a concrete example: blog posts with categories.
#[derive(Serialize, Deserialize, Debug, Collection)]
#[collection(name = "blog-post", views = [BlogPostsByCategory])]
pub struct BlogPost {
pub title: String,
pub body: String,
pub category: Option<String>,
}
Let's insert this data for these examples:
BlogPost {
title: String::from("New version of BonsaiDb released"),
body: String::from("..."),
category: Some(String::from("Rust")),
}
.push_into(&db)?;
BlogPost {
title: String::from("New Rust version released"),
body: String::from("..."),
category: Some(String::from("Rust")),
}
.push_into(&db)?;
BlogPost {
title: String::from("Check out this great cinnamon roll recipe"),
body: String::from("..."),
category: Some(String::from("Cooking")),
}
.push_into(&db)?;
All examples on this page are available in their full form in the repository at book/book-examples/tests.
While category
should be an enum, let's first explore using String
and upgrade to an enum at the end (it requires one additional step). Let's implement a View that will allow users to find blog posts by their category as well as count the number of posts in each category.
#[derive(Debug, Clone, View, ViewSchema)]
#[view(collection = BlogPost, key = Option<String>, value = u32, name = "by-category")]
pub struct BlogPostsByCategory;
impl MapReduce for BlogPostsByCategory {
fn map<'doc>(&self, document: &'doc BorrowedDocument<'_>) -> ViewMapResult<'doc, Self> {
let post = BlogPost::document_contents(document)?;
document.header.emit_key_and_value(post.category, 1)
}
fn reduce(
&self,
mappings: &[ViewMappedValue<Self::View>],
_rereduce: bool,
) -> ReduceResult<Self::View> {
Ok(mappings.iter().map(|mapping| mapping.value).sum())
}
}
The three view-related traits being implemented are View
,
ViewSchema
, and MapReduce
. These
traits are designed to allow keeping the View
implementation in a shared code
library that is used by both client-side and server-side code, while keeping the
ViewSchema
and MapReduce
implementation in the server executable only.
Views for SerializedCollection
For users who are using SerializedCollection
, CollectionViewSchema
can be implemented instead of ViewSchema
. The only difference between the two is that the map()
function takes a CollectionDocument
instead of a BorrowedDocument
.
Value Serialization
For views to function, the Value type must able to be serialized and deserialized from storage. To accomplish this, all views must implement the SerializedView
trait. For Serde-compatible data structures, DefaultSerializedView
is an empty trait that can be implemented instead to provide the default serialization that BonsaiDb recommends.
Map
The first line of the map
function calls SerializedCollection::document_contents()
to deserialize the stored BlogPost
. The second line returns an emitted Key and Value -- in our case a clone of the post's category and the value 1_u32
. With the map function, we're able to use query()
and query_with_docs()
:
let rust_posts = db
.view::<BlogPostsByCategory>()
.with_key(&Some(String::from("Rust")))
.query_with_docs()?;
for mapping in &rust_posts {
let post = BlogPost::document_contents(mapping.document)?;
println!(
"Retrieved post #{} \"{}\"",
mapping.document.header.id, post.title
);
}
The above snippet queries the Database for all documents in the BlogPost
Collection that emitted a Key of Some("Rust")
.
If you're using a SerializedCollection
, you can use query_with_collection_docs()
to have the deserialization done automatically for you:
let rust_posts = db
.view::<BlogPostsByCategory>()
.with_key(&Some(String::from("Rust")))
.query_with_collection_docs()?;
for mapping in &rust_posts {
println!(
"Retrieved post #{} \"{}\"",
mapping.document.header.id, mapping.document.contents.title
);
}
Reduce
The second function to learn about is the reduce()
function. It is responsible for turning an array of Key/Value pairs into a single Value. In some cases, BonsaiDb might need to call reduce()
with values that have already been reduced one time. If this is the case, rereduce
is set to true.
In this example, we're using the built-in Iterator::sum()
function to turn our Value of 1_u32
into a single u32
representing the total number of documents.
let rust_post_count = db
.view::<BlogPostsByCategory>()
.with_key(&Some(String::from("Rust")))
.reduce()?;
assert_eq!(rust_post_count, 2);
Changing an exising view
If you have data stored in a view, but want to update the view to store data
differently, implement ViewSchema::version()
and return
a unique number. When BonsaiDb checks the view's integrity, it will notice that
there is a version mis-match and automatically re-index the view.
There is no mechanism to access the data until this operation is complete.
Understanding Re-reduce
Let's examine this data set:
Document ID | BlogPost Category |
---|---|
1 | Some("Rust") |
2 | Some("Rust") |
3 | Some("Cooking") |
4 | None |
When updating views, each view entry is reduced and the value is cached. These are the view entries:
View Entry ID | Reduced Value |
---|---|
Some("Rust") | 2 |
Some("Cooking") | 1 |
None | 1 |
When a reduce query is issued for a single key, the value can be returned without further processing. But, if the reduce query matches multiple keys, the View's reduce()
function will be called with the already reduced values with rereduce
set to true
. For example, retrieving the total count of blog posts:
let total_post_count = db.view::<BlogPostsByCategory>().reduce()?;
assert_eq!(total_post_count, 3);
Once BonsaiDb has gathered each of the key's reduced values, it needs to further reduce that list into a single value. To accomplish this, the View's reduce()
function to be invoked with rereduce
set to true
, and with mappings containing:
Key | Value |
---|---|
Some("Rust") | 2 |
Some("Cooking") | 1 |
None | 1 |
This produces a final value of 4.
How does BonsaiDb make this efficient?
When saving Documents, BonsaiDb does not immediately update related views. It instead notes what documents have been updated since the last time the View was indexed.
When a View is accessed, the queries include an AccessPolicy
. If you aren't overriding it, UpdateBefore
is used. This means that when the query is evaluated, BonsaiDb will first check if the index is out of date due to any updated data. If it is, it will update the View before evaluating the query.
If you're wanting to get results quickly and are willing to accept data that might not be updated, the access policies UpdateAfter
and NoUpdate
can be used depending on your needs.
If multiple simulataneous queries are being evaluted for the same View and the View is outdated, BonsaiDb ensures that only a single view indexer will execute while both queries wait for it to complete.
Using arbitrary types as a View Key
In our previous example, we used String
for the Key type. The reason is important: Keys must be sortable by our underlying storage engine, which means special care must be taken. Most serialization types do not guarantee binary sort order. Instead, BonsaiDb exposes the Key
trait.
Schema
A Schema is a group of one or more Collections. A Schema can be instantiated as a Database. The Schema describes how a set of data behaves, and a Database is a set of data on-disk.
Database
A Database is a set of stored collections. Each Database is described by a Schema. Unlike the other concepts, this concept corresponds to multiple types:
- For bonsaidb-local:
Database
/AsyncDatabase
- For bonsaidb-server:
ServerDatabase
- For bonsaidb-client:
AsyncRemoteDatabase
/BlockingRemoteDatabase
All of these types implement a Connection
trait.
Storage
The StorageConnection trait allows interacting with a BonsaiDb multi-database storage instance.
There are three implementations of the StorageConnection
trait:
Storage
/AsyncStorage
: A local, file-based server implementation with no networking capabilities.Server
: A networked server implementation, written usingStorage
. This server supports QUIC- and WebSocket-based protocols. The QUIC protocol is preferred, but it uses UDP which many load balancers don't support. If you're exposing BonsaiDb behind a load balancer, WebSockets may be the only option depending on your host's capabilities.AsyncClient
/BlockingClient
: A network client implementation that connects to a server.
PubSub
The Publish/Subscribe pattern enables developers to design systems that produce and receive messages. It is implemented for BonsaiDb through the PubSub
and Subscriber
traits.
A common example of what PubSub enables is implementing a simple chat system. Each chat participant can subscribe to messages on the chat
topic, and when any participant publishes a chat
message, all subscribers will receive a copy of that message.
A working example of PubSub is available at examples/basic-local/examples/pubsub.rs
.
Use cases of BonsaiDb
Single database model (No networking)
This use case is most similar to utilizing SQLite for your database. In this mode, BonsaiDb directly interacts with files on your disk to provide your database. Unlike other file-based databases, however, it's easy to migrate to any of these scenarios from this starting position:
graph LR code{{Rust Code}} local[(bonsaidb-local::Database)] code <--> local
A working example of how to use a local database can be found at examples/basic-local/examples/basic-local.rs
.
Multi-database model (No networking)
This model is most similar to using multiple SQLite databases. In this mode, you interact with a Storage
that you spawn within your code.
graph LR code{{Rust Code}} local[(bonsaidb-local::Storage)] code <--> server server <--> local
If you look at the source behind Database::open_local
, you'll see that the single-database model is using Storage
under the hood.
Server model (QUIC or WebSockets)
This model is most similar to using other document databases, like CouchDB or MongoDB. In this mode, you interact with a Client
that connects via either QUIC or WebSockets with a server. From the server code's perspective, this model is the same as the multi-database model, except that the server is listening for and responding to network traffic.
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server[[bonsaidb-server]] local[(bonsaidb-local)] client-code <--> client client <-. network .-> server server <--> local server-code <--> server
A working example of this model can be found at examples/basic-server/examples/basic-server.rs
. When writing client/server applications that utilize BonsaiDb, you can have the BonsaiDb server running withing your server application. This means that your server still has the ability not use networking to interact with BonsaiDb. Regardless of if you run any other server code, your BonsaiDb server will be accessible through a Client
over the network.
API Platform model (QUIC or WebSockets)
If you're finding yourself developing an API for your application, and all of the consumers of this API are already connected to BonsaiDb, you may want to take advantage of the custom api functionality of the server:
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server[[bonsaidb-server]] backend[[Backend]] local[(bonsaidb-local)] client-code <--> client client <-. network .-> server server <--> local server-code <--> server server-code <--> backend backend <--> server
The BonsaiDb CustomServer
type accepts one generic parameter that implements the Backend
trait. This trait is used to customize the server in many ways, but one of the associated types is a Api
implementor.
See this page for an overview of how to set up a custom api server.
Coming Later: Cluster model
When you're at the stage of scaling beyond a single server, you will be able to upgrade your server to a cluster using the hypothetical bonsaidb-cluster
crate. The clustering model is still being designed, but the goal is something similar to:
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server1[[server 1]] server2[[server 2]] server3[[server 3]] cluster[[bonsaidb-cluster]] client-code <--> client client <-. network .-> cluster server-code <--> cluster cluster <--> server1 cluster <--> server2 cluster <--> server3 server1 <--> server2 server2 <--> server3 server1 <--> server3
In this model, the local storage element is hidden; Each server has its own storage. This model is very similar from the viewpoint of your server and client code -- the primary difference is that the server-side connection is being established using the cluster crate. From the client's perspective, the cluster behaves as a single entity -- sending a request to any server node will result in the same result within the cluster.
All features of BonsaiDb will be designed to work in cluster mode seamlessly. PubSub
will ensure that subscribers will receive messages regardless of which server they're connected to.
Custom Api Server
The Api
trait defines two associated types, Response, and Error. The Api
type is akin to a "request" that the server receives. The server will invoke a Handler
, expecting a result with the associated Response and Error types.
All code on this page comes from this example:
examples/basic-server/examples/custom-api.rs
.
This example shows how to derive the Api
trait. Because an error type isn't specified, the derive macro will use BonsaiDb's Infallible
type as the error type.
#[derive(Serialize, Deserialize, Debug, Api)]
#[api(name = "ping", response = Pong)]
pub struct Ping;
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Pong;
#[derive(Serialize, Deserialize, Debug, Api)]
#[api(name = "increment", response = Counter)]
pub struct IncrementCounter {
amount: u64,
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Counter(pub u64);
To implement the server, we must define a Handler
, which is invoked each time the Api
type is received by the server.
/// Dispatches Requests and returns Responses.
#[derive(Debug)]
pub struct ExampleHandler;
/// The Request::Ping variant has `#[actionable(protection = "none")]`, which
/// causes `PingHandler` to be generated with a single method and no implicit
/// permission handling.
#[async_trait]
impl Handler<Ping> for ExampleHandler {
async fn handle(_session: HandlerSession<'_>, _request: Ping) -> HandlerResult<Ping> {
Ok(Pong)
}
}
Finally, the client can issue the API call and receive the response, without needing any extra steps to serialize. This works regardless of whether the client is connected via QUIC or WebSockets.
async fn ping_the_server(
client: &AsyncClient,
client_name: &str,
) -> Result<(), bonsaidb::core::Error> {
match client.send_api_request(&Ping).await {
Ok(Pong) => {
println!("Received Pong from server on {client_name}");
}
other => println!("Unexpected response from API call on {client_name}: {other:?}"),
}
Ok(())
}
Permissions
One of the strengths of using BonsaiDb's custom api functionality is the ability to tap into the permissions handling that BonsaiDb uses. The Ping request has no permissions, but let's add permission handling to our IncrementCounter
API. We will do this by creating an increment_counter
function that expects two parameters: a connection to the storage layer with unrestricted permissions, and a second connection to the storage layer which has been restricted to the permissions the client invoking it is authorized to perform:
/// The permissible actions that can be granted for this example api.
#[derive(Debug, Action)]
#[action(actionable = bonsaidb::core::actionable)]
pub enum ExampleActions {
Increment,
DoSomethingCustom,
}
pub async fn increment_counter<S: AsyncStorageConnection<Database = C>, C: AsyncKeyValue>(
storage: &S,
as_client: &S,
amount: u64,
) -> Result<u64, bonsaidb::core::Error> {
as_client.check_permission([Identifier::from("increment")], &ExampleActions::Increment)?;
let database = storage.database::<()>("counter").await?;
database.increment_key_by("counter", amount).await
}
#[async_trait]
impl Handler<IncrementCounter> for ExampleHandler {
async fn handle(
session: HandlerSession<'_>,
request: IncrementCounter,
) -> HandlerResult<IncrementCounter> {
Ok(Counter(
increment_counter(session.server, &session.as_client, request.amount).await?,
))
}
}
The Handler
is provided a HandlerSession
as well as the Api
type, which provides all the context information needed to verify the connected client's authenticated identity and permissions. Additionally, it provides two ways to access the storage layer: with unrestricted permissions or restricted to the permissions granted to the client.
Let's finish configuring the server to allow all unauthenticated users the abilty to Ping
, and all authenticated users the ability to Increment
the counter:
let server = Server::open(
ServerConfiguration::new("custom-api.bonsaidb")
.default_permissions(Permissions::from(
Statement::for_any()
.allowing(&BonsaiAction::Server(ServerAction::Connect))
.allowing(&BonsaiAction::Server(ServerAction::Authenticate(
AuthenticationMethod::PasswordHash,
))),
))
.authenticated_permissions(Permissions::from(vec![
Statement::for_any().allowing(&ExampleActions::Increment)
]))
.with_api::<ExampleHandler, Ping>()?
.with_api::<ExampleHandler, IncrementCounter>()?
.with_schema::<()>()?,
)
.await?;
For more information on managing permissions, see Administration/Permissions.
The full example these snippets are taken from is available in the repository.
Overview
BonsaiDb aims to offer the majority of its functionality in local operation. The networked server adds some functionality on top of the local version, but its main function is to add the ability to use networking to talk to the database.
Because of this model, it makes it easy to transition a local database to a networked database server. Start with whatever model fits your needs today, and when your neeeds change, BonsaiDb will adapt.
When to use the Local Integration
- You're going to databases from one process at a time. BonsaiDb is designed for concurrency and can scale with the capabilities of the hardware. However, the underlying storage layer that BonsaiDb is built upon, Nebari, does not support multiple processes writing its data simultaneously. If you need to access the database from multiple processes, the server integration is what you should use. While it doesn't offer IPC communication today, a pull-request would be accepted to that added that functionality (along with the corresponding unit tests).
- You have no public API/PubSub/access needs or have implemented those with another stack.
When to use the Server Integration
- You need to access databases from more than one process or machine.
- You are OK with downtime due to loss of service when the single server is offline. If you need to have a highly-available database, you should use the Cluster Integration (Coming Soon).
- Your database load can be met with a single machine. If you have enough load that you need to share the processing power of multiple servers, you should use the Cluster Integration (Coming Soon)
Coming Soon: When to use the Cluster Integration
- You need to access databases from more than one machine.
- You need a highly-available setup.
- You need/want to split load between multiple machines.
Async vs Blocking
BonsaiDb supports both async and blocking (threaded) access. Its aim is to provide a first-class experience no matter which architecture you choose for your Rust application.
Local-only
Storage
and Database
are the blocking implementations
of BonsaiDb. These types provide the lowest overhead access to BonsaiDb as they
will block the currently executing thread to perform the operations.
AsyncStorage
and AsyncDatabase
are simple
types that "wrap" Storage
and Database
instances with
an asynchronous API. BonsaiDb does this by spawning a blocking task in Tokio.
Internally, Tokio uses a pool of threads to drive blocking operations. This may
sound like a lot of overhead, but it is surprisingly lightweight.
Our recommendation is to pick the programming style that fits your needs the best. Do you need lightweight task concurrency, or is basic threading enough? If this application grew in scope, would it ever need to be a networked application?
If you anticipate needing to use BonsaiDb's networked server, you should review the next section to consider how Tokio benefits a networked server.
Networked Server
When building a networked server, a common strategy to handle inbound
connections is to allow each connection to have a thread. This is expensive,
however, as each thread needs its own stack allocated and is managed by the
kernel. When designing a server with long-running connections, async allows
handling more connections with fewer system resources. As such, BonsaiDb's
server is built atop Tokio, and the traits used to extend the server are
async_trait
s.
The networked server is built atop AsyncStorage
, which means
that you can convert a server instance into a blocking Storage
instance, allowing local access to your server to remain blocking.
Networked Client
BonsaiDb's networked client uses Tokio for all networking on non-WASM targets, and uses the browser's WebSocket APIs for WASM targets.
On all non-WASM targets, the networked client can be used without a Tokio runtime present. When instantiated this way, a runtime will automatically be run powering the client's networking. In the future, it is possible that non-Tokio-based networking implementations could be provided instead for the blocking client implementation.
For WASM, the networked client does not provide blocking trait implementations. If you are building for WASM, you must use the async traits.
The differences between the APIs
The core traits are split into two types: blocking and async.
| Blocking | Async |
|----------------------|---------------------------|
| `Connection` | `AsyncConnection` |
| `StorageConnection` | `AsyncStorageConnection` |
| `PubSub` | `AsyncPubSub` |
| `Subscriber` | `AsyncSubscriber` |
| `KeyValue` | `AsyncKeyValue` |
| `LowLevelConnection` | `AsyncLowLevelConnection` |
By splitting these traits, BonsaiDb tries to make it harder to accidentally use
a blocking API in an asynchronous context. In general, all other functions are
exposed in pairs: a blocking version, and an async version with the suffix
"_async". For example, SerializedCollection::get
is the blocking API, and
SerializedCollection::get_async
is the async API.
When developing a project that uses both async and blocking modes of access, it is considered a good practice to separate modules based on whether they are blocking or not. This can help spot mistakes when the wrong type of trait is imported in the wrong type of module.
Integrating BonsaiDb Locally
BonsaiDb supports multiple databases and multiple schemas. However, for many applications, you only need a single database.
If you're only wanting a single database, the setup is straightforward: (from examples/basic-local/examples/basic-local.rs
)
let db = Database::open::<Message>(
StorageConfiguration::new("basic.bonsaidb")
)?;
Under the hood, BonsaiDb is creating a multi-database Storage
with a local Database
named default
for you. If you need to switch to a multi-database model, you can open the storage and access the default
database: (adapted from examples/basic-local/examples/basic-local.rs
)
let storage = Storage::open(
StorageConfiguration::new("basic.bonsaidb")
.with_schema::<Message>()?
)?;
let db = storage.create_database::<Message>(
"messages",
true
)?;
You can register multiple schemas so that databases can be purpose-built.
Common Traits
To help your code transition between different modes of accessing BonsaiDb, you can use these common traits to make your methods accept any style of BonsaiDb access.
Database
implementsConnection
,KeyValue
, andPubSub
.AsyncDatabase
implementsAsyncConnection
,AsyncKeyValue
, andAsyncPubSub
.Storage
/AsyncStorage
implementStorageConnection
/AsyncStorageConnection
, respectively.
For example, examples/basic-local/examples/basic-local.rs
uses this helper method to insert a record:
fn insert_a_message<C: Connection>(
connection: &C,
value: &str,
) -> Result<(), bonsaidb::core::Error> {
Message {
contents: String::from(value),
timestamp: SystemTime::now(),
}
.push_into(connection)?;
Ok(())
}
Integrating the networked BonsaiDb Server
To access BonsaiDb over the network, you're going to be writing two pieces of code: the server code and the client code.
Your BonsaiDb Server
The first step is to create a Server
, which uses local Storage
under the hood. This means that if you're already using BonsaiDb in local mode, you can swap your usage of Storage
with Server
in your server code without running your database through any tools. Here's the setup code from basic-server/examples/basic-server.rs
let server = Server::open(
ServerConfiguration::new("server-data.bonsaidb")
.default_permissions(DefaultPermissions::AllowAll)
.with_schema::<Shape>()?,
)
.await?;
if server.certificate_chain().await.is_err() {
server.install_self_signed_certificate(true).await?;
}
let certificate = server
.certificate_chain()
.await?
.into_end_entity_certificate();
server.create_database::<Shape>("my-database", true).await?;
Once you have a server initialized, calling listen_on
will begin listening for connections on the port specified. This uses the preferred native protocol which uses UDP. If you find that UDP is not working for your setup or want to put BonsaiDb behind a load balancer that doesn't support UDP, you can enable WebSocket support and call listen_for_websockets_on
.
You can call both, but since these functions don't return until the server is shut down, you should spawn them instead:
let task_server = server.clone();
tokio::spawn(async move {
task_server.listen_on(5645).await
});
let server = server.clone();
tokio::spawn(async move {
task_server.listen_for_websockets_on("localhost:8080", false).await
});
If you're not running any of your own code on the server, and you're only using one listening method, you can just await the listen method of your choice in your server's main. This code example configures BonsaiDb on UDP port 5645, but this is not an officially registered port.
From the Client
BlockingClient
and AsyncClient
can support both the native protocol and WebSockets. They determine which protocol to use based on the scheme in the URL:
bonsaidb://*
will connect using the native BonsaiDb protocol.ws://*
orwss://*
will connect using WebSockets.
Here's how to connect over BonsaiDb's native protocol, from examples/basic-server/examples/basic-server.rs
:
AsyncClient::build(Url::parse("bonsaidb://localhost:5645")?)
.with_certificate(certificate)
.build()
.await?
This is using a pinned certificate to connect. Other methods are supported, but better certificate management is coming soon.
Common Traits
The examples above use types that are powered by common traits, allowing code to be written with generic trait bounds that can operate the same regardless of whether the code is being called locally or remotely.
Server
implementsAsyncStorageConnection
.Server::as_blocking()
can be used to receive a type that implementsStorageConnection
.Server::database()
returns a localDatabase
, which implementsConnection
,KeyValue
, andPubSub
. Local access in the server executable doesn't go over the network.BlockingClient
/AsyncClient
implementStorageConnection
/AsyncStorageConnection
.BlockingClient::database()
/AsyncClient::database()
return types that implementConnection
/AsyncConnection
,KeyValue
/AsyncKeyValue
, andPubSub
/AsyncPubSub
.
Integrating into a BonsaiDb Cluster
Coming Soon.
The goals of this feature are to make clustering simple. We hope to provide an experience that allows someone who is operating a networked server to desire two types of clusters:
One-leader mode
When setting up a cluster initially, you will begin with one-leader mode. In this mode, you can add as many nodes to the cluster as you wish, but only one node will be processing all of the data updates. All nodes can handle requests, but requests that can't be served locally will be forwarded to the leader. This allows for the use of read-replicas to alleviate load in some read-heavy situations.
Another benefit of this mode are that it supports a two-node configuration. If you're scaling your app and need a reliable backup for quicker disaster recovery, you can operate a read replica and manually failover when the situation arises.
If you decide to allow automatic failover in this mode, there is a chance for data loss, as the leader does not wait for read-replicas to synchronize data. Any transactions that committed and were not synchronized before the outage occurred would not be on the other servers. Thus, this mode is not intended for high-availability configurations, although some users may elect to use it in such a configuration knowing these limitations.
Quorum mode
Once you have a cluster with at least 3 nodes, you can switch the cluster into quorum mode. For any given N
nodes, all requests must reach an agreed response by N / 2 + 1
members. For example, in a cluster of 3 nodes, there must be 2 successful responses before a client can receive a response to its request.
In quorum mode, your data is divided into shards and those shards replicated throughout the cluster onto at least 3 nodes (configurable). Initially, with just 3 nodes available, the only benefits are having a highly-available cluster with no data loss during when a single node goes down.
As you add more nodes to your cluster, however, you can re-balance your databases to move shards. The author of BonsaiDb did not enjoy this process in CouchDB when he had to do it and aims to make these tools easy and effortless to use. Ideally, there would be a low-maintenance mode that would allow the cluster to re-shard itself authomatically during allowed maintenance periods, ensuring data is distributed more evenly amongst the cluster.
Additional long-term dreams of quorum mode include the ability to customize node selection criteria on a per-database basis. The practical use of node selection is to ensure that at least 3 unique nodes are picked for each shard. However, allowing custom logic to evaluate which nodes should be selected for any database would allow ultimate flexibility. For example, if you have a globally deployed application, and you have some data that is geographically specific, you could locate each region's database on nodes within those locations' data centers.
When?
Clustering is an important part of the design of Cosmic Verge. As such, it is a priority for us to work on. But, the overall game is a very large project, so we hesitate to make any promises on timelines.
Connection
The Connection
/AsyncConnection
traits
contain functions for interacting with collections in a database. These traits
are implemented by the Database
types in each
crate.
Using these trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
The only differences between Connection
and AsyncConnection
is that
AsyncConnection
is able to be used in async code and the Connection
trait is
designed to block the current thread. BonsaiDb is designed to try to make it
hard to accidentally call a blocking function from async code accidentally,
while still supporting both async and blocking access patterns.
StorageConnection
The StorageConnection
/AsyncStorageConnection
traits contain functions for interacting with BonsaiDb's multi-database storage. These traits are implemented by the Storage
types.
Using these trait, you can write code that generically works with BonsaiDb's multi-database storage types regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
The only differences between StorageConnection
and AsyncStorageConnection
is that
AsyncStorageConnection
is able to be used in async code and the StorageConnection
trait is
designed to block the current thread. BonsaiDb is designed to try to make it
hard to accidentally call a blocking function from async code accidentally,
while still supporting both async and blocking access patterns.
PubSub Trait
The PubSub
/AsyncPubSub
traits contain functions for using PubSub in BonsaiDb. The traits are implemented by the Database
types in each crate:
- For bonsaidb-local:
Database
/AsyncDatabase
- For bonsaidb-server:
ServerDatabase
, andDatabase
viaServerDatabase::as_blocking()
- For bonsaidb-client:
BlockingRemoteDatabase
/AsyncRemoteDatabase
Using these traits, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
The only differences between PubSub
and AsyncPubSub
is that
AsyncPubSub
is able to be used in async code and the PubSub
trait is
designed to block the current thread. BonsaiDb is designed to try to make it
hard to accidentally call a blocking function from async code accidentally,
while still supporting both async and blocking access patterns.
Key-Value Trait
The KeyValue
/AsyncKeyValue
traits contain functions for interacting the atomic key-value store. The key-value store provides high-performance atomic operations without ACID compliance. Once the data is persisted to disk, it holds the same guarantees as all of BonsaiDb, but this feature is designed for high throughput and does not wait to persist to disk before reporting success to the client. This trait is implemented by the Database
types in each crate:
- For bonsaidb-local:
Database
/AsyncDatabase
- For bonsaidb-server:
ServerDatabase
, andDatabase
viaServerDatabase::as_blocking()
- For bonsaidb-client:
BlockingRemoteDatabase
/AsyncRemoteDatabase
Using these traits, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
The only differences between KeyValue
and AsyncKeyValue
is that
AsyncKeyValue
is able to be used in async code and the KeyValue
trait is
designed to block the current thread. BonsaiDb is designed to try to make it
hard to accidentally call a blocking function from async code accidentally,
while still supporting both async and blocking access patterns.
Key Trait
The Key trait enables types to define a serialization and deserialization
format that preserves the order of the original type in serialized form. Whe
comparing two values encoded with as_ord_bytes()
using a
byte-by-byte comparison operation should match the result produced by comparing
the two original values using the Ord
. For integer formats, this
generally means encoding the bytes in network byte order (big endian).
For example, let's consider two values:
Value | as_ord_bytes() |
---|---|
1u16 | [ 0, 1] |
300u16 | [ 1, 44] |
1_u16.cmp(&300_u16)
and 1_u16.as_ord_bytes()?.cmp(&300_u16.as_ord_bytes()?)
both produce Ordering::Less.
Implementing the Key
trait
The Key
trait declares two functions: as_ord_bytes()
and
from_ord_bytes
. The intention is to convert the type to bytes
using a network byte order for numerical types, and for non-numerical types, the
bytes need to be stored in binary-sortable order.
Here is how BonsaiDb implements Key for EnumKey
:
impl<'k, T> Key<'k> for EnumKey<T>
where
T: ToPrimitive + FromPrimitive + Clone + Eq + Ord + std::fmt::Debug + Send + Sync,
{
const CAN_OWN_BYTES: bool = false;
fn from_ord_bytes<'b>(bytes: ByteSource<'k, 'b>) -> Result<Self, Self::Error> {
let primitive = u64::decode_variable(bytes.as_ref())?;
T::from_u64(primitive)
.map(Self)
.ok_or_else(|| io::Error::new(ErrorKind::InvalidData, UnknownEnumVariant))
}
}
impl<T> KeyEncoding<Self> for EnumKey<T>
where
T: ToPrimitive + FromPrimitive + Clone + Eq + Ord + std::fmt::Debug + Send + Sync,
{
type Error = io::Error;
const LENGTH: Option<usize> = None;
fn describe<Visitor>(visitor: &mut Visitor)
where
Visitor: KeyVisitor,
{
visitor.visit_type(KeyKind::Unsigned);
}
fn as_ord_bytes(&self) -> Result<Cow<'_, [u8]>, Self::Error> {
let integer = self
.0
.to_u64()
.map(Unsigned::from)
.ok_or_else(|| io::Error::new(ErrorKind::InvalidData, IncorrectByteLength))?;
Ok(Cow::Owned(integer.to_variable_vec()?))
}
}
By implementing Key
you can take full control of converting your view keys.
Using an Enum as a Key
The easiest way to expose an enum is to derive num_traits::FromPrimitive
and num_traits::ToPrimitive
using num-derive, and add an impl EnumKey
line:
#[derive(Serialize, Deserialize, Eq, PartialEq, Debug, Key, Clone)]
pub enum Category {
Rust,
Cooking,
}
The View code remains unchanged, although the associated Key type can now be set to Option<Category>
. The queries can now use the enum instead of a String
:
let rust_post_count = db
.view::<BlogPostsByCategory>()
.with_key(&Some(Category::Rust))
.reduce()?;
BonsaiDb will convert the enum to a u64 and use that value as the Key. A u64 was chosen to ensure fairly wide compatibility even with some extreme usages of bitmasks. If you wish to customize this behavior, you can implement Key
directly.
Configuration
BonsaiDb attempts to have reasonable default configuration options, but it's important to browse the available options to ensure there aren't options that might help your particular needs.
Storage Configuration
The StorageConfiguration
structure is used to open a local-only database. The ServerConfiguration
struct contains an instance of StorageConfiguration
, and all configuration optionsl are available on it.
Vault Key Storage
By default, BonsaiDb sets vault_key_storage
to a file stored within the database folder. This is incredibly insecure and should not be used outside of testing.
For secure encryption, it is important to store the vault keys in a location that is separate from the database. If the keys are on the same harware as the database, anyone with access to the disk will be able to decrypt the stored data.
If you have more than one server, you can still use LocalVaultKeyStorage
in conjunction with a mounted network share for reasonable security practices -- assuming the network share itself is properly secured.
If you have an S3-compatible storage service available, you can use bonsaidb::keystorage::s3
to store the vault keys with that service.
Note that by storing your keys remotely, your BonsaiDb database will not be able to be opened unless the keys are able to be read.
Vault Key Storage can also be set using Builder::vault_key_storage
.
Default Encryption Key
By setting default_encryption_key
to a key, all data will be encrypted when written to the disk.
If default_encryption_key
is None
, encryption will still be performed for collections that return a key from Collection::encryption_key()
.
Can also be set using Builder::default_encryption_key
.
Tasks: Worker Count
The tasks.worker_count
setting controls the number of worker tasks that are spawned to process background tasks.
Can also be set using Builder::tasks_worker_count
.
Views: Check Integrity on Open
When views.check_integrity_on_open
is true, all views in all databases will be checked on startup for integrity. If this value is false, the integrity of the view will not be checked until it is accessed for the first time.
By default, BonsaiDb delays checking a view's integrity until its accessed for the first time. it may, however, be preferred to have a higher startup time to ensure consistent response times once the server is running after a restart of the server.
Can also be set using Builder::check_view_integrity_on_open
.
Key-Value Persistence
The Key-Value store is designed to be a lightweight, atomic data store that is suitable for caching data, tracking metrics, or other situations where a Collection might be overkill.
By default, BonsaiDb persists Key-Value store changes to disk immediately. For light usage, this will not be noticable, and it ensures that no data will ever be lost.
If you're willing to accept potentially losing recent writes, key_value_persistence
can be configured to lazily commit changes to disk. The documentation for KeyValuePersistence
contains examples as well as an explanation of how the rules are evaluated.
Key-Value Persistence can also be set using Builder::key_value_persistence
.
Server Configuration
The ServerConfiguration
structure is used to open a BonsaiDb server. Being built atop the local storage engine, this structure exposes an instance of StorageConfiguration
, allowing full customization.
Server Name
The server_name
setting is for the primary DNS name of the server. The server's TLS certificate should be valid for the server's name.
When using ACME, this setting controls the primary certificate requested.
Can also be set using a builder-style method.
Client Simultaneous Request Limit
BonsaiDb's networking protocols support multiple requests to be sent before any responses have been received, sometimes called pipelining. Without a limit, a single malicious client could send a large number of load-inducing requests and cause reliability of service issues for other clients.
By limiting each connection's maximum ability to a reasonable number, it allows clients to take advantage of pipelining without allowing any one client to saturate the server with requests.
This limit is set using the client_simultaneous_request_limit field or builder-style method.
Request Worker Count
The request_workers
configuration controls the number of worker tasks that process incoming requests from connected clients. It can also be set via a builder-style method.
Default Permissions and Authenticated Permissions
When first connecting to a server, the client is unauthenticated and is granted the permissions defined by default_permissions
. Once a connected client has authenticated, the client will be granted authenticated_permissions
in addition to whatever permissions already granted by the authenticated role.
By default, both default_permissions
and authenticated_permissions
contain no granted permissions. This means that by default, no connections are allowed to a server, as the connection hasn't been gramted BonsaiAction::Server(
ServerAction::Connect() )
.
ACME Configuration (LetsEncrypt)
ACME has two configurable options, a contact email and the ACME directory.
ACME Contact Email
The contact email is submitted to the ACME directory as part of requesting a TLS certificate. It is optional for the LetsEncrypt directories.
A valid value for this field begins with mailto:
.
The contact email can be set using acme.contact_email
or the builder-style method.
ACME Directory
By default, BonsaiDb uses the production LetsEncrypt directory, but any ACME directory can be specified.
The directory can be set using acme.directory
or the builder-style method.
Permissions
BonsaiDb uses role-based access control (RBAC). In short, permissions are granted through statements within permission groups. Users are able to log in and receive permissions that were granted via permission groups or roles.
This section has two subsections:
- Permission Statements: An overview of the resource names and actions used within BonsaiDb.
- Users, Groups, and Roles: A more thorough explanation of BonsaiDb's access control.
While the most common use case will be granting permissions to act upon BonsaiDb itself, the permissions system is designed to be generic enough that it can be used as the application's permission system if desired.
By default, no actions are allowed.
Currently, permissions are only applied to connections over a network. In the future, permissions will be able to be applied even on local connections.
Permission Statements
A Statement grants permissions to execute Action
s on ResourceName
s.
Actions and Resources
ResourceName
s are simply namespaced Identifier
s. An example could be: "bonsaidb".*."khonsulabs-admin.users".1
. Each segment can be a string, an integer, or a wildcard (*
).
In BonsaiDb, nearly everything has a resource name. The example above refers to a document with ID 1
in the khonsulabs-admin.users
collection in any database. The bonsaidb::core::permissions::bonsai
module contains functions to create properly formatted ResourceName
s.
Also within the same module are the built-in Action
s. The base enum for all actions used within BonsaiDb is BonsaiAction
Below is an overview of the resource names and actions by category.
Server
The ServerAction
enum contains the actions that are related to StorageConnection
. For APIs that accept a database name parameter, the resource name will be database_resource_name(database)
. For all other actions, the resource name is bonsaidb_resource_name()
.
For actions that operate upon users (e.g., creating a user), the resource name is user_resource_name(username).
At-rest Encryption
Access to encrypted information can be controlled by limiting access to the encryption key used. Currently, BonsaiDb only has support for a shared master key, but in the future additional keys will be able to be created. Because Encrypt
and Decrypt
are separate actions, access to read and write can be controlled independently.
The resource name for an encryption key is encryption_key_resource_name(key_id)
.
Database
The DatabaseAction
enum contains the actions that are related to a specific database. Actions that act on the database directly will use the resource name database_resource_name(database)
.
For Collection
s, there are three resource names used. For actions that operate on the collection directly, the resource name is collection_resource_name(database, collection)
. For actions that operate on a document, the resource name is document_resource_name(database, collection, id)
. Finally, for actions that operate on a View
, the resource name is view_resource_name(database, view)
.
For actions that operate upon the key-value entry, the resource name is keyvalue_key_resource_name(database, namespace, key)
.
For actions that operate on a PubSub
topic, the resource name is pubsub_topic_resource_name(database, topic)
.
Statement Examples
Coming Soon.
Users, Groups, and Roles
The most common flow that a database administrator needs to support is granting a user the ability to take specific actions on specific resources. To accomplish this, a PermissionGroup
must be created containing the permission statements, covered in the previous section, that you wish to apply.
PermissionGroup
s can be assigned directly to users by adding the group ID to their User
document.
At first glance, Role
s may appear somewhat redundant. One or more PermissionGroup
s can be assigned to a role, and roles can be assigned to a user. Why would you want to use roles at all?
The general advice the authors of BonsaiDb suggest is to use groups for limited amounts of functionality, keeping each group's list of statements concise and easy to understand. Then, create roles that combine groups of functionality in meaningful ways. One meaningful way could be creating roles based on job titles inside of a company. In theory, a person's job defines what they do within the company.
In practice, permissions are never as clean as one would hope, which is why BonsaiDb allows assigning groups and roles to users directly. Roles should be used as much as possible, but sometimes assigning a group directly is just needed. For example, imagine the CEO telling you, "I know Bob is just a sales guy, but he needs to be able to update this record. I trust him more than the other sales people. Just make it happen." As the database administrator, you can decide whether to introduce a new role or just temporarily assign an extra group to this one user.
At-Rest Encryption
BonsaiDb offers at-rest encryption. An overview of how it works is available in the bonsaidb::local::vault
module.
Enabling at-rest encryption by default
When opening your BonsaiDb instance, there is a configuration option default_encryption_key
. Once this is set, all new data written that supports being encrypted will be encrypted at-rest.
let storage = Storage::open(
StorageConfiguration::new(&directory)
.vault_key_storage(vault_key_storage)
.default_encryption_key(KeyId::Master)
)?;
Enabling at-rest encryption on a per-collection basis
Collection::encryption_key()
can be overridden on a per-Collection basis. If a collection requests encryption but the feature is disabled, an error will be generated.
To enable a collection to be encrypted when the feature is enabled, only return a key when ENCRYPTION_ENABLED is true.