BonsaiDb User's Guide

BonsaiDb is an ACID-compliant, document-database written in Rust. Its goal is to be a general-purpose database that aims to simplify development and deployment by providing reliable building blocks that are lightweight enough for hobby projects running with minimal resources, but scalable for when your hobby project becomes a deployed product.

This user's guide aims to provide a guided walkthrough for users to understand how BonsaiDb works. This guide is meant to be supplemental to the documentation. If you learn best by exploring examples, many are available in /examples in the repository. If, however, you learn best by taking a guided tour of how something works, this guide is specifically for you.

If you have any feedback on this guide, please file an issue, and we will try to address any issues or shortcomings.

Thank you for exploring BonsaiDb.

About dev.bonsaidb.io

The domain that is hosting this user guide is powered by Dossier. Dossier is a static file hosting project that is powered by BonsaiDb's file storage features, currently served on a Stardust instance in Amsterdam at Scaleway. Every page/image/script is loaded from BonsaiDb (although the domain has caching by Cloudflare).

Concepts

This is a list of common concepts that will be used throughout this book as well as the documentation.

Document

A Document is a single piece of stored data. Each document is stored within a Collection, and has a unique ID within that Collection. There are two document types: OwnedDocument and BorrowedDocument. The View::map() function takes a BorrowedDocument, but nearly every other API utilizes OwnedDocument.

When a document is updated, BonsaiDb will check that the revision information passed matches the currently stored information. If not, a conflict error will be returned. This simple check ensures that if two writers try to update the document simultaneously, one will succeed and the other will receive an error.

Serializable Collections

BonsaiDb provides the SerializedCollection trait, which allows automatic serialization and deserialization in many sitautions. When using SerializedCollection::document_contents() function, the document is serialized and deserialized by the format returned from SerializedCollection::format().

The CollectionDocument<T> type provides convenience methods of interacting with serializable documents.

Default serialization of Serde-compatible types

BonsaiDb provides a convenience trait for Serde-compatible data types: DefaultSerialization. This empty trait can be implemented on any collection to have BonsaiDb provide its preferred serialization format, Pot.

Raw Collections

If you would prefer to manually manage the data stored inside of a Document, you can directly manage the contents field. BonsaiDb will not interact with the contents of a Document. Only code that you write will parse or update the stored data.

Collection

A Collection is a group of Documents and associated functionality. Collections are stored on-disk using ACID-compliant, transactional storage, ensuring your data is protected in the event of a sudden power failure or other unfortunate event.

The goal of a Collection is to encapsulate the logic for a set of data in such a way that Collections could be designed to be shared and reused in multiple Schemas or applications.

Each Collection must have a unique CollectionName. To help prevent naming collisions, an authority can be specified which provides a level of namespacing.

A Collection can contain one or more Views.

Primary Keys

All documents stored in a collection have a unique id. Primary keys in BonsaiDb are immutable -- once a document has an id, it cannot be changed. If you wish for a unique key that can be updated, use a unique view, and use a separate value as a primary key.

The type is controlled by the Collection::PrimaryKey associated type. If you're using the derive macro, the type can be specified using the primary_key parameter as in this example:

#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "multi-key", primary_key = AssociatedProfileKey)]
struct AssociatedProfileData {
    value: String,
}

#[derive(Key, Debug, Clone, Copy, Eq, PartialEq, Ord, PartialOrd)]
struct AssociatedProfileKey {
    pub user_id: u32,
    pub data_id: u64,
}

If no primary_key is specified in the derive, u64 will be used.

Inserting and accessing the collection can be done using the newly defined primary key type:

    let key = AssociatedProfileKey {
        user_id: user.header.id,
        data_id: 64,
    };
    let inserted = AssociatedProfileData {
        value: String::from("hello"),
    }
    .insert_into(&key, &db)?;
    let retrieved = AssociatedProfileData::get(&key, &db)?.expect("document not found");
    assert_eq!(inserted, retrieved);

Natural Ids

It's not uncommon to need to store data in a database that has an "external" identifier. Some examples could be externally authenticated user profiles, social networking site posts, or for normalizing a single type's fields across multiple Collections. These types of values are often called "Natural Keys" or "Natural Identifiers".

SerializedCollection::natural_id() or DefaultSerialzation::natural_id can be implemented to return a value from the contents of a new document. When using the derive marco, the natural_id parameter can be specified with either a closure or a path to a function with the same signature.

In this example, the UserProfile type is used to represent a user that has a unique ID in an external database:

#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "user-profiles", primary_key = u32)]
struct UserProfile {
    #[natural_id]
    pub external_id: u32,
    pub name: String,
}

When pushing a UserProfile into the collection, the id will automatically be assigned by calling natural_id():

    let user = UserProfile {
        external_id: 42,
        name: String::from("ecton"),
    }
    .push_into(&db)?;
    let retrieved_from_database = UserProfile::get(&42, &db)?.expect("document not found");
    assert_eq!(user, retrieved_from_database);

Custom Primary Keys

All primary keys must implement the Key trait . BonsaiDb provides implementations for many types, but any type that implements the trait can be used.

When using push/push_into, BonsaiDb needs to assign a unique ID to the incoming document. If natural_id() returns None, the storage backend will handle id assignment.

If the document being pushed is the first document in the collection, Key::first_value() is called and the resulting value is used as the document's id.

If the collection already has documents, the highest-ordered key is queried from the collection. Key::next_value() is then called and the resulting value is used as the document's id. Key implementors should not allow next_value() to return a value that is less than the current value. NextValueError::WouldWrap should be returned instead of wrapping.

Both first_value() and next_value() by default return NextValueError::Unimplemented. If any error occurs while trying to assign a unique id, the transaction will be aborted and rolled back.

View

A View is a map/reduce-powered method of quickly accessing information inside of a Collection. Each View can only belong to one Collection.

Views define two important associated types: a Key type and a Value type. You can think of these as the equivalent entries in a map/dictionary-like collection that supports more than one entry for each Key. The Key is used to filter the View's results, and the Value is used by your application or the reduce() function.

Views are a powerful, yet abstract concept. Let's look at a concrete example: blog posts with categories.

#[derive(Serialize, Deserialize, Debug, Collection)]
#[collection(name = "blog-post", views = [BlogPostsByCategory])]
pub struct BlogPost {
    pub title: String,
    pub body: String,
    pub category: Option<String>,
}

Let's insert this data for these examples:

    BlogPost {
        title: String::from("New version of BonsaiDb released"),
        body: String::from("..."),
        category: Some(String::from("Rust")),
    }
    .push_into(&db)?;

    BlogPost {
        title: String::from("New Rust version released"),
        body: String::from("..."),
        category: Some(String::from("Rust")),
    }
    .push_into(&db)?;

    BlogPost {
        title: String::from("Check out this great cinnamon roll recipe"),
        body: String::from("..."),
        category: Some(String::from("Cooking")),
    }
    .push_into(&db)?;

All examples on this page are available in their full form in the repository at book/book-examples/tests.

While category should be an enum, let's first explore using String and upgrade to an enum at the end (it requires one additional step). Let's implement a View that will allow users to find blog posts by their category as well as count the number of posts in each category.

#[derive(Debug, Clone, View, ViewSchema)]
#[view(collection = BlogPost, key = Option<String>, value = u32, name = "by-category")]
pub struct BlogPostsByCategory;

impl MapReduce for BlogPostsByCategory {
    fn map<'doc>(&self, document: &'doc BorrowedDocument<'_>) -> ViewMapResult<'doc, Self> {
        let post = BlogPost::document_contents(document)?;
        document.header.emit_key_and_value(post.category, 1)
    }

    fn reduce(
        &self,
        mappings: &[ViewMappedValue<Self::View>],
        _rereduce: bool,
    ) -> ReduceResult<Self::View> {
        Ok(mappings.iter().map(|mapping| mapping.value).sum())
    }
}

The three view-related traits being implemented are View, ViewSchema, and MapReduce. These traits are designed to allow keeping the View implementation in a shared code library that is used by both client-side and server-side code, while keeping the ViewSchema and MapReduce implementation in the server executable only.

Views for SerializedCollection

For users who are using SerializedCollection, CollectionViewSchema can be implemented instead of ViewSchema. The only difference between the two is that the map() function takes a CollectionDocument instead of a BorrowedDocument.

Value Serialization

For views to function, the Value type must able to be serialized and deserialized from storage. To accomplish this, all views must implement the SerializedView trait. For Serde-compatible data structures, DefaultSerializedView is an empty trait that can be implemented instead to provide the default serialization that BonsaiDb recommends.

Map

The first line of the map function calls SerializedCollection::document_contents() to deserialize the stored BlogPost. The second line returns an emitted Key and Value -- in our case a clone of the post's category and the value 1_u32. With the map function, we're able to use query() and query_with_docs():

    let rust_posts = db
        .view::<BlogPostsByCategory>()
        .with_key(&Some(String::from("Rust")))
        .query_with_docs()?;
    for mapping in &rust_posts {
        let post = BlogPost::document_contents(mapping.document)?;
        println!(
            "Retrieved post #{} \"{}\"",
            mapping.document.header.id, post.title
        );
    }

The above snippet queries the Database for all documents in the BlogPost Collection that emitted a Key of Some("Rust").

If you're using a SerializedCollection, you can use query_with_collection_docs() to have the deserialization done automatically for you:

    let rust_posts = db
        .view::<BlogPostsByCategory>()
        .with_key(&Some(String::from("Rust")))
        .query_with_collection_docs()?;
    for mapping in &rust_posts {
        println!(
            "Retrieved post #{} \"{}\"",
            mapping.document.header.id, mapping.document.contents.title
        );
    }

Reduce

The second function to learn about is the reduce() function. It is responsible for turning an array of Key/Value pairs into a single Value. In some cases, BonsaiDb might need to call reduce() with values that have already been reduced one time. If this is the case, rereduce is set to true.

In this example, we're using the built-in Iterator::sum() function to turn our Value of 1_u32 into a single u32 representing the total number of documents.

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(&Some(String::from("Rust")))
        .reduce()?;
    assert_eq!(rust_post_count, 2);

Changing an exising view

If you have data stored in a view, but want to update the view to store data differently, implement ViewSchema::version() and return a unique number. When BonsaiDb checks the view's integrity, it will notice that there is a version mis-match and automatically re-index the view.

There is no mechanism to access the data until this operation is complete.

Understanding Re-reduce

Let's examine this data set:

Document IDBlogPost Category
1Some("Rust")
2Some("Rust")
3Some("Cooking")
4None

When updating views, each view entry is reduced and the value is cached. These are the view entries:

View Entry IDReduced Value
Some("Rust")2
Some("Cooking")1
None1

When a reduce query is issued for a single key, the value can be returned without further processing. But, if the reduce query matches multiple keys, the View's reduce() function will be called with the already reduced values with rereduce set to true. For example, retrieving the total count of blog posts:

    let total_post_count = db.view::<BlogPostsByCategory>().reduce()?;
    assert_eq!(total_post_count, 3);

Once BonsaiDb has gathered each of the key's reduced values, it needs to further reduce that list into a single value. To accomplish this, the View's reduce() function to be invoked with rereduce set to true, and with mappings containing:

KeyValue
Some("Rust")2
Some("Cooking")1
None1

This produces a final value of 4.

How does BonsaiDb make this efficient?

When saving Documents, BonsaiDb does not immediately update related views. It instead notes what documents have been updated since the last time the View was indexed.

When a View is accessed, the queries include an AccessPolicy. If you aren't overriding it, UpdateBefore is used. This means that when the query is evaluated, BonsaiDb will first check if the index is out of date due to any updated data. If it is, it will update the View before evaluating the query.

If you're wanting to get results quickly and are willing to accept data that might not be updated, the access policies UpdateAfter and NoUpdate can be used depending on your needs.

If multiple simulataneous queries are being evaluted for the same View and the View is outdated, BonsaiDb ensures that only a single view indexer will execute while both queries wait for it to complete.

Using arbitrary types as a View Key

In our previous example, we used String for the Key type. The reason is important: Keys must be sortable by our underlying storage engine, which means special care must be taken. Most serialization types do not guarantee binary sort order. Instead, BonsaiDb exposes the Key trait.

Schema

A Schema is a group of one or more Collections. A Schema can be instantiated as a Database. The Schema describes how a set of data behaves, and a Database is a set of data on-disk.

Database

A Database is a set of stored collections. Each Database is described by a Schema. Unlike the other concepts, this concept corresponds to multiple types:

All of these types implement a Connection trait.

Storage

The StorageConnection trait allows interacting with a BonsaiDb multi-database storage instance.

There are three implementations of the StorageConnection trait:

  • Storage/AsyncStorage: A local, file-based server implementation with no networking capabilities.
  • Server: A networked server implementation, written using Storage. This server supports QUIC- and WebSocket-based protocols. The QUIC protocol is preferred, but it uses UDP which many load balancers don't support. If you're exposing BonsaiDb behind a load balancer, WebSockets may be the only option depending on your host's capabilities.
  • AsyncClient/BlockingClient: A network client implementation that connects to a server.

PubSub

The Publish/Subscribe pattern enables developers to design systems that produce and receive messages. It is implemented for BonsaiDb through the PubSub and Subscriber traits.

A common example of what PubSub enables is implementing a simple chat system. Each chat participant can subscribe to messages on the chat topic, and when any participant publishes a chat message, all subscribers will receive a copy of that message.

A working example of PubSub is available at examples/basic-local/examples/pubsub.rs.

Use cases of BonsaiDb

Single database model (No networking)

This use case is most similar to utilizing SQLite for your database. In this mode, BonsaiDb directly interacts with files on your disk to provide your database. Unlike other file-based databases, however, it's easy to migrate to any of these scenarios from this starting position:

graph LR
  code{{Rust Code}}
  local[(bonsaidb-local::Database)]
  code <--> local

A working example of how to use a local database can be found at examples/basic-local/examples/basic-local.rs.

Multi-database model (No networking)

This model is most similar to using multiple SQLite databases. In this mode, you interact with a Storage that you spawn within your code.

graph LR
  code{{Rust Code}}
  local[(bonsaidb-local::Storage)]
  code <--> server
  server <--> local

If you look at the source behind Database::open_local, you'll see that the single-database model is using Storage under the hood.

Server model (QUIC or WebSockets)

This model is most similar to using other document databases, like CouchDB or MongoDB. In this mode, you interact with a Client that connects via either QUIC or WebSockets with a server. From the server code's perspective, this model is the same as the multi-database model, except that the server is listening for and responding to network traffic.

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server[[bonsaidb-server]]
  local[(bonsaidb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server

A working example of this model can be found at examples/basic-server/examples/basic-server.rs. When writing client/server applications that utilize BonsaiDb, you can have the BonsaiDb server running withing your server application. This means that your server still has the ability not use networking to interact with BonsaiDb. Regardless of if you run any other server code, your BonsaiDb server will be accessible through a Client over the network.

API Platform model (QUIC or WebSockets)

If you're finding yourself developing an API for your application, and all of the consumers of this API are already connected to BonsaiDb, you may want to take advantage of the custom api functionality of the server:

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server[[bonsaidb-server]]
  backend[[Backend]]
  local[(bonsaidb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server
  server-code <--> backend
  backend <--> server

The BonsaiDb CustomServer type accepts one generic parameter that implements the Backend trait. This trait is used to customize the server in many ways, but one of the associated types is a Api implementor.

See this page for an overview of how to set up a custom api server.

Coming Later: Cluster model

When you're at the stage of scaling beyond a single server, you will be able to upgrade your server to a cluster using the hypothetical bonsaidb-cluster crate. The clustering model is still being designed, but the goal is something similar to:

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server1[[server 1]]
  server2[[server 2]]
  server3[[server 3]]
  cluster[[bonsaidb-cluster]]
  client-code <--> client
  client <-. network .-> cluster
  server-code <--> cluster
  cluster <--> server1
  cluster <--> server2
  cluster <--> server3
  server1 <--> server2
  server2 <--> server3
  server1 <--> server3

In this model, the local storage element is hidden; Each server has its own storage. This model is very similar from the viewpoint of your server and client code -- the primary difference is that the server-side connection is being established using the cluster crate. From the client's perspective, the cluster behaves as a single entity -- sending a request to any server node will result in the same result within the cluster.

All features of BonsaiDb will be designed to work in cluster mode seamlessly. PubSub will ensure that subscribers will receive messages regardless of which server they're connected to.

Custom Api Server

The Api trait defines two associated types, Response, and Error. The Api type is akin to a "request" that the server receives. The server will invoke a Handler, expecting a result with the associated Response and Error types.

All code on this page comes from this example: examples/basic-server/examples/custom-api.rs.

This example shows how to derive the Api trait. Because an error type isn't specified, the derive macro will use BonsaiDb's Infallible type as the error type.

#[derive(Serialize, Deserialize, Debug, Api)]
#[api(name = "ping", response = Pong)]
pub struct Ping;

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Pong;

#[derive(Serialize, Deserialize, Debug, Api)]
#[api(name = "increment", response = Counter)]
pub struct IncrementCounter {
    amount: u64,
}

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Counter(pub u64);

To implement the server, we must define a Handler, which is invoked each time the Api type is received by the server.

/// Dispatches Requests and returns Responses.
#[derive(Debug)]
pub struct ExampleHandler;

/// The Request::Ping variant has `#[actionable(protection = "none")]`, which
/// causes `PingHandler` to be generated with a single method and no implicit
/// permission handling.
#[async_trait]
impl Handler<Ping> for ExampleHandler {
    async fn handle(_session: HandlerSession<'_>, _request: Ping) -> HandlerResult<Ping> {
        Ok(Pong)
    }
}

Finally, the client can issue the API call and receive the response, without needing any extra steps to serialize. This works regardless of whether the client is connected via QUIC or WebSockets.

async fn ping_the_server(
    client: &AsyncClient,
    client_name: &str,
) -> Result<(), bonsaidb::core::Error> {
    match client.send_api_request(&Ping).await {
        Ok(Pong) => {
            println!("Received Pong from server on {client_name}");
        }
        other => println!("Unexpected response from API call on {client_name}: {other:?}"),
    }

    Ok(())
}

Permissions

One of the strengths of using BonsaiDb's custom api functionality is the ability to tap into the permissions handling that BonsaiDb uses. The Ping request has no permissions, but let's add permission handling to our IncrementCounter API. We will do this by creating an increment_counter function that expects two parameters: a connection to the storage layer with unrestricted permissions, and a second connection to the storage layer which has been restricted to the permissions the client invoking it is authorized to perform:

/// The permissible actions that can be granted for this example api.
#[derive(Debug, Action)]
#[action(actionable = bonsaidb::core::actionable)]
pub enum ExampleActions {
    Increment,
    DoSomethingCustom,
}

pub async fn increment_counter<S: AsyncStorageConnection<Database = C>, C: AsyncKeyValue>(
    storage: &S,
    as_client: &S,
    amount: u64,
) -> Result<u64, bonsaidb::core::Error> {
    as_client.check_permission([Identifier::from("increment")], &ExampleActions::Increment)?;
    let database = storage.database::<()>("counter").await?;
    database.increment_key_by("counter", amount).await
}

#[async_trait]
impl Handler<IncrementCounter> for ExampleHandler {
    async fn handle(
        session: HandlerSession<'_>,
        request: IncrementCounter,
    ) -> HandlerResult<IncrementCounter> {
        Ok(Counter(
            increment_counter(session.server, &session.as_client, request.amount).await?,
        ))
    }
}

The Handler is provided a HandlerSession as well as the Api type, which provides all the context information needed to verify the connected client's authenticated identity and permissions. Additionally, it provides two ways to access the storage layer: with unrestricted permissions or restricted to the permissions granted to the client.

Let's finish configuring the server to allow all unauthenticated users the abilty to Ping, and all authenticated users the ability to Increment the counter:

    let server = Server::open(
        ServerConfiguration::new("custom-api.bonsaidb")
            .default_permissions(Permissions::from(
                Statement::for_any()
                    .allowing(&BonsaiAction::Server(ServerAction::Connect))
                    .allowing(&BonsaiAction::Server(ServerAction::Authenticate(
                        AuthenticationMethod::PasswordHash,
                    ))),
            ))
            .authenticated_permissions(Permissions::from(vec![
                Statement::for_any().allowing(&ExampleActions::Increment)
            ]))
            .with_api::<ExampleHandler, Ping>()?
            .with_api::<ExampleHandler, IncrementCounter>()?
            .with_schema::<()>()?,
    )
    .await?;

For more information on managing permissions, see Administration/Permissions.

The full example these snippets are taken from is available in the repository.

Overview

BonsaiDb aims to offer the majority of its functionality in local operation. The networked server adds some functionality on top of the local version, but its main function is to add the ability to use networking to talk to the database.

Because of this model, it makes it easy to transition a local database to a networked database server. Start with whatever model fits your needs today, and when your neeeds change, BonsaiDb will adapt.

When to use the Local Integration

  • You're going to databases from one process at a time. BonsaiDb is designed for concurrency and can scale with the capabilities of the hardware. However, the underlying storage layer that BonsaiDb is built upon, Nebari, does not support multiple processes writing its data simultaneously. If you need to access the database from multiple processes, the server integration is what you should use. While it doesn't offer IPC communication today, a pull-request would be accepted to that added that functionality (along with the corresponding unit tests).
  • You have no public API/PubSub/access needs or have implemented those with another stack.

When to use the Server Integration

  • You need to access databases from more than one process or machine.
  • You are OK with downtime due to loss of service when the single server is offline. If you need to have a highly-available database, you should use the Cluster Integration (Coming Soon).
  • Your database load can be met with a single machine. If you have enough load that you need to share the processing power of multiple servers, you should use the Cluster Integration (Coming Soon)

Coming Soon: When to use the Cluster Integration

  • You need to access databases from more than one machine.
  • You need a highly-available setup.
  • You need/want to split load between multiple machines.

Async vs Blocking

BonsaiDb supports both async and blocking (threaded) access. Its aim is to provide a first-class experience no matter which architecture you choose for your Rust application.

Local-only

Storage and Database are the blocking implementations of BonsaiDb. These types provide the lowest overhead access to BonsaiDb as they will block the currently executing thread to perform the operations.

AsyncStorage and AsyncDatabase are simple types that "wrap" Storage and Database instances with an asynchronous API. BonsaiDb does this by spawning a blocking task in Tokio. Internally, Tokio uses a pool of threads to drive blocking operations. This may sound like a lot of overhead, but it is surprisingly lightweight.

Our recommendation is to pick the programming style that fits your needs the best. Do you need lightweight task concurrency, or is basic threading enough? If this application grew in scope, would it ever need to be a networked application?

If you anticipate needing to use BonsaiDb's networked server, you should review the next section to consider how Tokio benefits a networked server.

Networked Server

When building a networked server, a common strategy to handle inbound connections is to allow each connection to have a thread. This is expensive, however, as each thread needs its own stack allocated and is managed by the kernel. When designing a server with long-running connections, async allows handling more connections with fewer system resources. As such, BonsaiDb's server is built atop Tokio, and the traits used to extend the server are async_traits.

The networked server is built atop AsyncStorage, which means that you can convert a server instance into a blocking Storage instance, allowing local access to your server to remain blocking.

Networked Client

BonsaiDb's networked client uses Tokio for all networking on non-WASM targets, and uses the browser's WebSocket APIs for WASM targets.

On all non-WASM targets, the networked client can be used without a Tokio runtime present. When instantiated this way, a runtime will automatically be run powering the client's networking. In the future, it is possible that non-Tokio-based networking implementations could be provided instead for the blocking client implementation.

For WASM, the networked client does not provide blocking trait implementations. If you are building for WASM, you must use the async traits.

The differences between the APIs

The core traits are split into two types: blocking and async.

| Blocking             |   Async                   |
|----------------------|---------------------------|
| `Connection`         | `AsyncConnection`         |
| `StorageConnection`  | `AsyncStorageConnection`  |
| `PubSub`             | `AsyncPubSub`             |
| `Subscriber`         | `AsyncSubscriber`         |
| `KeyValue`           | `AsyncKeyValue`           |
| `LowLevelConnection` | `AsyncLowLevelConnection` |

By splitting these traits, BonsaiDb tries to make it harder to accidentally use a blocking API in an asynchronous context. In general, all other functions are exposed in pairs: a blocking version, and an async version with the suffix "_async". For example, SerializedCollection::get is the blocking API, and SerializedCollection::get_async is the async API.

When developing a project that uses both async and blocking modes of access, it is considered a good practice to separate modules based on whether they are blocking or not. This can help spot mistakes when the wrong type of trait is imported in the wrong type of module.

Integrating BonsaiDb Locally

BonsaiDb supports multiple databases and multiple schemas. However, for many applications, you only need a single database.

If you're only wanting a single database, the setup is straightforward: (from examples/basic-local/examples/basic-local.rs)

let db = Database::open::<Message>(
    StorageConfiguration::new("basic.bonsaidb")
)?;

Under the hood, BonsaiDb is creating a multi-database Storage with a local Database named default for you. If you need to switch to a multi-database model, you can open the storage and access the default database: (adapted from examples/basic-local/examples/basic-local.rs)

let storage = Storage::open(
        StorageConfiguration::new("basic.bonsaidb")
            .with_schema::<Message>()?
)?;
let db = storage.create_database::<Message>(
    "messages",
    true
)?;

You can register multiple schemas so that databases can be purpose-built.

Common Traits

To help your code transition between different modes of accessing BonsaiDb, you can use these common traits to make your methods accept any style of BonsaiDb access.

For example, examples/basic-local/examples/basic-local.rs uses this helper method to insert a record:

fn insert_a_message<C: Connection>(
    connection: &C,
    value: &str,
) -> Result<(), bonsaidb::core::Error> {
    Message {
        contents: String::from(value),
        timestamp: SystemTime::now(),
    }
    .push_into(connection)?;
    Ok(())
}

Integrating the networked BonsaiDb Server

To access BonsaiDb over the network, you're going to be writing two pieces of code: the server code and the client code.

Your BonsaiDb Server

The first step is to create a Server, which uses local Storage under the hood. This means that if you're already using BonsaiDb in local mode, you can swap your usage of Storage with Server in your server code without running your database through any tools. Here's the setup code from basic-server/examples/basic-server.rs

    let server = Server::open(
        ServerConfiguration::new("server-data.bonsaidb")
            .default_permissions(DefaultPermissions::AllowAll)
            .with_schema::<Shape>()?,
    )
    .await?;
    if server.certificate_chain().await.is_err() {
        server.install_self_signed_certificate(true).await?;
    }
    let certificate = server
        .certificate_chain()
        .await?
        .into_end_entity_certificate();
    server.create_database::<Shape>("my-database", true).await?;

Once you have a server initialized, calling listen_on will begin listening for connections on the port specified. This uses the preferred native protocol which uses UDP. If you find that UDP is not working for your setup or want to put BonsaiDb behind a load balancer that doesn't support UDP, you can enable WebSocket support and call listen_for_websockets_on.

You can call both, but since these functions don't return until the server is shut down, you should spawn them instead:

let task_server = server.clone();
tokio::spawn(async move {
    task_server.listen_on(5645).await
});
let server = server.clone();
tokio::spawn(async move {
    task_server.listen_for_websockets_on("localhost:8080", false).await
});

If you're not running any of your own code on the server, and you're only using one listening method, you can just await the listen method of your choice in your server's main. This code example configures BonsaiDb on UDP port 5645, but this is not an officially registered port.

From the Client

BlockingClient and AsyncClient can support both the native protocol and WebSockets. They determine which protocol to use based on the scheme in the URL:

  • bonsaidb://* will connect using the native BonsaiDb protocol.
  • ws://* or wss://* will connect using WebSockets.

Here's how to connect over BonsaiDb's native protocol, from examples/basic-server/examples/basic-server.rs:

AsyncClient::build(Url::parse("bonsaidb://localhost:5645")?)
    .with_certificate(certificate)
    .build()
    .await?

This is using a pinned certificate to connect. Other methods are supported, but better certificate management is coming soon.

Common Traits

The examples above use types that are powered by common traits, allowing code to be written with generic trait bounds that can operate the same regardless of whether the code is being called locally or remotely.

Integrating into a BonsaiDb Cluster

Coming Soon.

The goals of this feature are to make clustering simple. We hope to provide an experience that allows someone who is operating a networked server to desire two types of clusters:

One-leader mode

When setting up a cluster initially, you will begin with one-leader mode. In this mode, you can add as many nodes to the cluster as you wish, but only one node will be processing all of the data updates. All nodes can handle requests, but requests that can't be served locally will be forwarded to the leader. This allows for the use of read-replicas to alleviate load in some read-heavy situations.

Another benefit of this mode are that it supports a two-node configuration. If you're scaling your app and need a reliable backup for quicker disaster recovery, you can operate a read replica and manually failover when the situation arises.

If you decide to allow automatic failover in this mode, there is a chance for data loss, as the leader does not wait for read-replicas to synchronize data. Any transactions that committed and were not synchronized before the outage occurred would not be on the other servers. Thus, this mode is not intended for high-availability configurations, although some users may elect to use it in such a configuration knowing these limitations.

Quorum mode

Once you have a cluster with at least 3 nodes, you can switch the cluster into quorum mode. For any given N nodes, all requests must reach an agreed response by N / 2 + 1 members. For example, in a cluster of 3 nodes, there must be 2 successful responses before a client can receive a response to its request.

In quorum mode, your data is divided into shards and those shards replicated throughout the cluster onto at least 3 nodes (configurable). Initially, with just 3 nodes available, the only benefits are having a highly-available cluster with no data loss during when a single node goes down.

As you add more nodes to your cluster, however, you can re-balance your databases to move shards. The author of BonsaiDb did not enjoy this process in CouchDB when he had to do it and aims to make these tools easy and effortless to use. Ideally, there would be a low-maintenance mode that would allow the cluster to re-shard itself authomatically during allowed maintenance periods, ensuring data is distributed more evenly amongst the cluster.

Additional long-term dreams of quorum mode include the ability to customize node selection criteria on a per-database basis. The practical use of node selection is to ensure that at least 3 unique nodes are picked for each shard. However, allowing custom logic to evaluate which nodes should be selected for any database would allow ultimate flexibility. For example, if you have a globally deployed application, and you have some data that is geographically specific, you could locate each region's database on nodes within those locations' data centers.

When?

Clustering is an important part of the design of Cosmic Verge. As such, it is a priority for us to work on. But, the overall game is a very large project, so we hesitate to make any promises on timelines.

Connection

The Connection/AsyncConnection traits contain functions for interacting with collections in a database. These traits are implemented by the Database types in each crate.

Using these trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

The only differences between Connection and AsyncConnection is that AsyncConnection is able to be used in async code and the Connection trait is designed to block the current thread. BonsaiDb is designed to try to make it hard to accidentally call a blocking function from async code accidentally, while still supporting both async and blocking access patterns.

StorageConnection

The StorageConnection/AsyncStorageConnection traits contain functions for interacting with BonsaiDb's multi-database storage. These traits are implemented by the Storage types.

Using these trait, you can write code that generically works with BonsaiDb's multi-database storage types regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

The only differences between StorageConnection and AsyncStorageConnection is that AsyncStorageConnection is able to be used in async code and the StorageConnection trait is designed to block the current thread. BonsaiDb is designed to try to make it hard to accidentally call a blocking function from async code accidentally, while still supporting both async and blocking access patterns.

PubSub Trait

The PubSub/AsyncPubSub traits contain functions for using PubSub in BonsaiDb. The traits are implemented by the Database types in each crate:

Using these traits, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

The only differences between PubSub and AsyncPubSub is that AsyncPubSub is able to be used in async code and the PubSub trait is designed to block the current thread. BonsaiDb is designed to try to make it hard to accidentally call a blocking function from async code accidentally, while still supporting both async and blocking access patterns.

Key-Value Trait

The KeyValue/AsyncKeyValue traits contain functions for interacting the atomic key-value store. The key-value store provides high-performance atomic operations without ACID compliance. Once the data is persisted to disk, it holds the same guarantees as all of BonsaiDb, but this feature is designed for high throughput and does not wait to persist to disk before reporting success to the client. This trait is implemented by the Database types in each crate:

Using these traits, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

The only differences between KeyValue and AsyncKeyValue is that AsyncKeyValue is able to be used in async code and the KeyValue trait is designed to block the current thread. BonsaiDb is designed to try to make it hard to accidentally call a blocking function from async code accidentally, while still supporting both async and blocking access patterns.

Key Trait

The Key trait enables types to define a serialization and deserialization format that preserves the order of the original type in serialized form. Whe comparing two values encoded with as_ord_bytes() using a byte-by-byte comparison operation should match the result produced by comparing the two original values using the Ord. For integer formats, this generally means encoding the bytes in network byte order (big endian).

For example, let's consider two values:

Valueas_ord_bytes()
1u16[ 0, 1]
300u16[ 1, 44]

1_u16.cmp(&300_u16) and 1_u16.as_ord_bytes()?.cmp(&300_u16.as_ord_bytes()?) both produce Ordering::Less.

Implementing the Key trait

The Key trait declares two functions: as_ord_bytes() and from_ord_bytes. The intention is to convert the type to bytes using a network byte order for numerical types, and for non-numerical types, the bytes need to be stored in binary-sortable order.

Here is how BonsaiDb implements Key for EnumKey:

impl<'k, T> Key<'k> for EnumKey<T>
where
    T: ToPrimitive + FromPrimitive + Clone + Eq + Ord + std::fmt::Debug + Send + Sync,
{
    const CAN_OWN_BYTES: bool = false;

    fn from_ord_bytes<'b>(bytes: ByteSource<'k, 'b>) -> Result<Self, Self::Error> {
        let primitive = u64::decode_variable(bytes.as_ref())?;
        T::from_u64(primitive)
            .map(Self)
            .ok_or_else(|| io::Error::new(ErrorKind::InvalidData, UnknownEnumVariant))
    }
}

impl<T> KeyEncoding<Self> for EnumKey<T>
where
    T: ToPrimitive + FromPrimitive + Clone + Eq + Ord + std::fmt::Debug + Send + Sync,
{
    type Error = io::Error;

    const LENGTH: Option<usize> = None;

    fn describe<Visitor>(visitor: &mut Visitor)
    where
        Visitor: KeyVisitor,
    {
        visitor.visit_type(KeyKind::Unsigned);
    }

    fn as_ord_bytes(&self) -> Result<Cow<'_, [u8]>, Self::Error> {
        let integer = self
            .0
            .to_u64()
            .map(Unsigned::from)
            .ok_or_else(|| io::Error::new(ErrorKind::InvalidData, IncorrectByteLength))?;
        Ok(Cow::Owned(integer.to_variable_vec()?))
    }
}

By implementing Key you can take full control of converting your view keys.

Using an Enum as a Key

The easiest way to expose an enum is to derive num_traits::FromPrimitive and num_traits::ToPrimitive using num-derive, and add an impl EnumKey line:

#[derive(Serialize, Deserialize, Eq, PartialEq, Debug, Key, Clone)]
pub enum Category {
    Rust,
    Cooking,
}

The View code remains unchanged, although the associated Key type can now be set to Option<Category>. The queries can now use the enum instead of a String:

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(&Some(Category::Rust))
        .reduce()?;

BonsaiDb will convert the enum to a u64 and use that value as the Key. A u64 was chosen to ensure fairly wide compatibility even with some extreme usages of bitmasks. If you wish to customize this behavior, you can implement Key directly.

Configuration

BonsaiDb attempts to have reasonable default configuration options, but it's important to browse the available options to ensure there aren't options that might help your particular needs.

Storage Configuration

The StorageConfiguration structure is used to open a local-only database. The ServerConfiguration struct contains an instance of StorageConfiguration, and all configuration optionsl are available on it.

Vault Key Storage

By default, BonsaiDb sets vault_key_storage to a file stored within the database folder. This is incredibly insecure and should not be used outside of testing.

For secure encryption, it is important to store the vault keys in a location that is separate from the database. If the keys are on the same harware as the database, anyone with access to the disk will be able to decrypt the stored data.

If you have more than one server, you can still use LocalVaultKeyStorage in conjunction with a mounted network share for reasonable security practices -- assuming the network share itself is properly secured.

If you have an S3-compatible storage service available, you can use bonsaidb::keystorage::s3 to store the vault keys with that service.

Note that by storing your keys remotely, your BonsaiDb database will not be able to be opened unless the keys are able to be read.

Vault Key Storage can also be set using Builder::vault_key_storage.

Default Encryption Key

By setting default_encryption_key to a key, all data will be encrypted when written to the disk.

If default_encryption_key is None, encryption will still be performed for collections that return a key from Collection::encryption_key().

Can also be set using Builder::default_encryption_key.

Tasks: Worker Count

The tasks.worker_count setting controls the number of worker tasks that are spawned to process background tasks.

Can also be set using Builder::tasks_worker_count.

Views: Check Integrity on Open

When views.check_integrity_on_open is true, all views in all databases will be checked on startup for integrity. If this value is false, the integrity of the view will not be checked until it is accessed for the first time.

By default, BonsaiDb delays checking a view's integrity until its accessed for the first time. it may, however, be preferred to have a higher startup time to ensure consistent response times once the server is running after a restart of the server.

Can also be set using Builder::check_view_integrity_on_open.

Key-Value Persistence

The Key-Value store is designed to be a lightweight, atomic data store that is suitable for caching data, tracking metrics, or other situations where a Collection might be overkill.

By default, BonsaiDb persists Key-Value store changes to disk immediately. For light usage, this will not be noticable, and it ensures that no data will ever be lost.

If you're willing to accept potentially losing recent writes, key_value_persistence can be configured to lazily commit changes to disk. The documentation for KeyValuePersistence contains examples as well as an explanation of how the rules are evaluated.

Key-Value Persistence can also be set using Builder::key_value_persistence.

Server Configuration

The ServerConfiguration structure is used to open a BonsaiDb server. Being built atop the local storage engine, this structure exposes an instance of StorageConfiguration, allowing full customization.

Server Name

The server_name setting is for the primary DNS name of the server. The server's TLS certificate should be valid for the server's name.

When using ACME, this setting controls the primary certificate requested.

Can also be set using a builder-style method.

Client Simultaneous Request Limit

BonsaiDb's networking protocols support multiple requests to be sent before any responses have been received, sometimes called pipelining. Without a limit, a single malicious client could send a large number of load-inducing requests and cause reliability of service issues for other clients.

By limiting each connection's maximum ability to a reasonable number, it allows clients to take advantage of pipelining without allowing any one client to saturate the server with requests.

This limit is set using the client_simultaneous_request_limit field or builder-style method.

Request Worker Count

The request_workers configuration controls the number of worker tasks that process incoming requests from connected clients. It can also be set via a builder-style method.

Default Permissions and Authenticated Permissions

When first connecting to a server, the client is unauthenticated and is granted the permissions defined by default_permissions. Once a connected client has authenticated, the client will be granted authenticated_permissions in addition to whatever permissions already granted by the authenticated role.

By default, both default_permissions and authenticated_permissions contain no granted permissions. This means that by default, no connections are allowed to a server, as the connection hasn't been gramted BonsaiAction::Server(ServerAction::Connect() ).

ACME Configuration (LetsEncrypt)

ACME has two configurable options, a contact email and the ACME directory.

ACME Contact Email

The contact email is submitted to the ACME directory as part of requesting a TLS certificate. It is optional for the LetsEncrypt directories.

A valid value for this field begins with mailto:.

The contact email can be set using acme.contact_email or the builder-style method.

ACME Directory

By default, BonsaiDb uses the production LetsEncrypt directory, but any ACME directory can be specified.

The directory can be set using acme.directory or the builder-style method.

Permissions

BonsaiDb uses role-based access control (RBAC). In short, permissions are granted through statements within permission groups. Users are able to log in and receive permissions that were granted via permission groups or roles.

This section has two subsections:

While the most common use case will be granting permissions to act upon BonsaiDb itself, the permissions system is designed to be generic enough that it can be used as the application's permission system if desired.

By default, no actions are allowed.

Currently, permissions are only applied to connections over a network. In the future, permissions will be able to be applied even on local connections.

Permission Statements

A Statement grants permissions to execute Actions on ResourceNames.

Actions and Resources

ResourceNames are simply namespaced Identifiers. An example could be: "bonsaidb".*."khonsulabs-admin.users".1. Each segment can be a string, an integer, or a wildcard (*).

In BonsaiDb, nearly everything has a resource name. The example above refers to a document with ID 1 in the khonsulabs-admin.users collection in any database. The bonsaidb::core::permissions::bonsai module contains functions to create properly formatted ResourceNames.

Also within the same module are the built-in Actions. The base enum for all actions used within BonsaiDb is BonsaiAction Below is an overview of the resource names and actions by category.

Server

The ServerAction enum contains the actions that are related to StorageConnection. For APIs that accept a database name parameter, the resource name will be database_resource_name(database). For all other actions, the resource name is bonsaidb_resource_name().

For actions that operate upon users (e.g., creating a user), the resource name is user_resource_name(username).

At-rest Encryption

Access to encrypted information can be controlled by limiting access to the encryption key used. Currently, BonsaiDb only has support for a shared master key, but in the future additional keys will be able to be created. Because Encrypt and Decrypt are separate actions, access to read and write can be controlled independently.

The resource name for an encryption key is encryption_key_resource_name(key_id).

Database

The DatabaseAction enum contains the actions that are related to a specific database. Actions that act on the database directly will use the resource name database_resource_name(database).

For Collections, there are three resource names used. For actions that operate on the collection directly, the resource name is collection_resource_name(database, collection). For actions that operate on a document, the resource name is document_resource_name(database, collection, id). Finally, for actions that operate on a View, the resource name is view_resource_name(database, view).

For actions that operate upon the key-value entry, the resource name is keyvalue_key_resource_name(database, namespace, key).

For actions that operate on a PubSub topic, the resource name is pubsub_topic_resource_name(database, topic).

Statement Examples

Coming Soon.

Users, Groups, and Roles

The most common flow that a database administrator needs to support is granting a user the ability to take specific actions on specific resources. To accomplish this, a PermissionGroup must be created containing the permission statements, covered in the previous section, that you wish to apply.

PermissionGroups can be assigned directly to users by adding the group ID to their User document.

At first glance, Roles may appear somewhat redundant. One or more PermissionGroups can be assigned to a role, and roles can be assigned to a user. Why would you want to use roles at all?

The general advice the authors of BonsaiDb suggest is to use groups for limited amounts of functionality, keeping each group's list of statements concise and easy to understand. Then, create roles that combine groups of functionality in meaningful ways. One meaningful way could be creating roles based on job titles inside of a company. In theory, a person's job defines what they do within the company.

In practice, permissions are never as clean as one would hope, which is why BonsaiDb allows assigning groups and roles to users directly. Roles should be used as much as possible, but sometimes assigning a group directly is just needed. For example, imagine the CEO telling you, "I know Bob is just a sales guy, but he needs to be able to update this record. I trust him more than the other sales people. Just make it happen." As the database administrator, you can decide whether to introduce a new role or just temporarily assign an extra group to this one user.

At-Rest Encryption

BonsaiDb offers at-rest encryption. An overview of how it works is available in the bonsaidb::local::vault module.

Enabling at-rest encryption by default

When opening your BonsaiDb instance, there is a configuration option default_encryption_key. Once this is set, all new data written that supports being encrypted will be encrypted at-rest.

let storage = Storage::open(
    StorageConfiguration::new(&directory)
        .vault_key_storage(vault_key_storage)
        .default_encryption_key(KeyId::Master)
)?;

Enabling at-rest encryption on a per-collection basis

Collection::encryption_key() can be overridden on a per-Collection basis. If a collection requests encryption but the feature is disabled, an error will be generated.

To enable a collection to be encrypted when the feature is enabled, only return a key when ENCRYPTION_ENABLED is true.