BonsaiDb User's Guide

BonsaiDb is an ACID-compliant, document-database written in Rust. Its goal is to be a general-purpose database that aims to simplify development and deployment by providing reliable building blocks that are lightweight enough for hobby projects running with minimal resources, but scalable for when your hobby project becomes a deployed product.

This user's guide aims to provide a guided walkthrough for users to understand how BonsaiDb works. This guide is meant to be supplemental to the documentation. If you learn best by exploring examples, many are available in /examples in the repository. If, however, you learn best by taking a guided tour of how something works, this guide is specifically for you.

If you have any feedback on this guide, please file an issue, and we will try to address any issues or shortcomings.

Thank you for exploring BonsaiDb.

Concepts

This is a list of common concepts that will be used throughout this book as well as the documentation.

Document

A Document is a single piece of stored data. Each document is stored within a Collection, and has a unique ID within that Collection. There are two document types: OwnedDocument and BorrowedDocument. The View::map() function takes a BorrowedDocument, but nearly every other API utilizes OwnedDocument.

When a document is updated, BonsaiDb will check that the revision information passed matches the currently stored information. If not, a conflict error will be returned. This simple check ensures that if two writers try to update the document simultaneously, one will succeed and the other will receive an error.

Serializable Collections

BonsaiDb provides the SerializedCollection trait, which allows automatic serialization and deserialization in many sitautions. When using SerializedCollection::document_contents() function, the document is serialized and deserialized by the format returned from SerializedCollection::format().

The CollectionDocument<T> type provides convenience methods of interacting with serializable documents.

Default serialization of Serde-compatible types

BonsaiDb provides a convenience trait for Serde-compatible data types: DefaultSerialization. This empty trait can be implemented on any collection to have BonsaiDb provide its preferred serialization format, Pot.

Raw Collections

If you would prefer to manually manage the data stored inside of a Document, you can directly manage the contents field. BonsaiDb will not interact with the contents of a Document. Only code that you write will parse or update the stored data.

Collection

A Collection is a group of Documents and associated functionality. Collections are stored on-disk using ACID-compliant, transactional storage, ensuring your data is protected in the event of a sudden power failure or other unfortunate event.

The goal of a Collection is to encapsulate the logic for a set of data in such a way that Collections could be designed to be shared and reused in multiple Schemas or applications.

Each Collection must have a unique CollectionName. To help prevent naming collisions, an authority can be specified which provides a level of namespacing.

A Collection can contain one or more Views.

Primary Keys

All documents stored in a collection have a unique id. Primary keys in BonsaiDb are immutable -- once a document has an id, it cannot be changed. If you wish for a unique key that can be updated, use a unique view, and use a separate value as a primary key.

The type is controlled by the Collection::PrimaryKey associated type. If you're using the derive macro, the type can be specified using the primary_key parameter as in this example:

#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "multi-key", primary_key = (u32, u64))]
struct MultiKey {
    value: String,
}

If no primary_key is specified in the derive, u64 will be used.

Inserting and accessing the collection can be done using the newly defined primary key type:

    let inserted = MultiKey {
        value: String::from("hello"),
    }
    .insert_into((42, 64), &db)
    .await?;
    let retrieved = MultiKey::get((42, 64), &db)
        .await?
        .expect("document not found");
    assert_eq!(inserted, retrieved);

Natural Ids

It's not uncommon to need to store data in a database that has an "external" identifier. Some examples could be externally authenticated user profiles, social networking site posts, or for normalizing a single type's fields across multiple Collections. These types of values are often called "Natural Keys" or "Natural Identifiers".

SerializedCollection::natural_id() or DefaultSerialzation::natural_id can be implemented to return a value from the contents of a new document. When using the derive marco, the natural_id parameter can be specified with either a closure or a path to a function with the same signature.

In this example, the UserProfile type is used to represent a user that has a unique ID in an external database:

#[derive(Debug, Serialize, Deserialize, Collection, Eq, PartialEq)]
#[collection(name = "user-profiles", primary_key = u32, natural_id = |user: &UserProfile| Some(user.external_id))]
struct UserProfile {
    pub external_id: u32,
    pub name: String,
}

When pushing a UserProfile into the collection, the id will automatically be assigned by calling natural_id():

    let user = UserProfile {
        external_id: 42,
        name: String::from("ecton"),
    }
    .push_into(&db)
    .await?;
    let retrieved_from_database = UserProfile::get(42, &db)
        .await?
        .expect("document not found");
    assert_eq!(user, retrieved_from_database);

Custom Primary Keys

All primary keys must implement the Key trait . BonsaiDb provides implementations for many types, but any type that implements the trait can be used.

When using push/push_into, BonsaiDb needs to assign a unique ID to the incoming document. If natural_id() returns None, the storage backend will handle id assignment.

If the document being pushed is the first document in the collection, Key::first_value() is called and the resulting value is used as the document's id.

If the collection already has documents, the highest-ordered key is queried from the collection. Key::next_value() is then called and the resulting value is used as the document's id. Key implementors should not allow next_value() to return a value that is less than the current value. NextValueError::WouldWrap should be returned instead of wrapping.

Both first_value() and next_value() by default return NextValueError::Unimplemented. If any error occurs while trying to assign a unique id, the transaction will be aborted and rolled back.

View

A View is a map/reduce-powered method of quickly accessing information inside of a Collection. A View can only belong to one Collection.

Views define two important associated types: a Key type and a Value type. You can think of these as the equivalent entries in a map/dictionary-like collection that supports more than one entry for each Key. The Key is used to filter the View's results, and the Value is used by your application or the reduce() function.

Views are a powerful, yet abstract concept. Let's look at a concrete example: blog posts with categories.

#[derive(Serialize, Deserialize, Debug, Collection)]
#[collection(name = "blog-post", views = [BlogPostsByCategory])]
pub struct BlogPost {
    pub title: String,
    pub body: String,
    pub category: Option<String>,
}

Let's insert this data for these examples:

    BlogPost {
        title: String::from("New version of BonsaiDb released"),
        body: String::from("..."),
        category: Some(String::from("Rust")),
    }
    .push_into(&db)
    .await?;

    BlogPost {
        title: String::from("New Rust version released"),
        body: String::from("..."),
        category: Some(String::from("Rust")),
    }
    .push_into(&db)
    .await?;

    BlogPost {
        title: String::from("Check out this great cinnamon roll recipe"),
        body: String::from("..."),
        category: Some(String::from("Cooking")),
    }
    .push_into(&db)
    .await?;

All examples on this page are available in their full form in the repository at book/book-examples/tests.

While category should be an enum, let's first explore using String and upgrade to an enum at the end (it requires one additional step). Let's implement a View that will allow users to find blog posts by their category as well as count the number of posts in each category.

#[derive(Debug, Clone, View)]
#[view(collection = BlogPost, key = Option<String>, value = u32, name = "by-category")]
pub struct BlogPostsByCategory;

impl ViewSchema for BlogPostsByCategory {
    type View = Self;

    fn map(&self, document: &BorrowedDocument<'_>) -> ViewMapResult<Self::View> {
        let post = BlogPost::document_contents(document)?;
        document.header.emit_key_and_value(post.category, 1)
    }

    fn reduce(
        &self,
        mappings: &[ViewMappedValue<Self::View>],
        _rereduce: bool,
    ) -> ReduceResult<Self::View> {
        Ok(mappings.iter().map(|mapping| mapping.value).sum())
    }
}

The two traits being implemented are View and ViewSchema. These traits are designed to allow keeping the View implementation in a shared code library that is used by both client-side and server-side code, while keeping the ViewSchema implementation in the server executable only.

Views for SerializedCollection

For users who are using SerializedCollection, CollectionViewSchema can be implemented instead of ViewSchema. The only difference between the two is that the map() function takes a CollectionDocument instead of a BorrowedDocument.

Value Serialization

For views to function, the Value type must able to be serialized and deserialized from storage. To accomplish this, all views must implement the SerializedView trait. For Serde-compatible data structures, DefaultSerializedView is an empty trait that can be implemented instead to provide the default serialization that BonsaiDb recommends.

Map

The first line of the map function calls SerializedCollection::document_contents() to deserialize the stored BlogPost. The second line returns an emitted Key and Value -- in our case a clone of the post's category and the value 1_u32. With the map function, we're able to use query() and query_with_docs():

    let rust_posts = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(String::from("Rust")))
        .query_with_docs()
        .await?;
    for mapping in &rust_posts {
        let post = BlogPost::document_contents(mapping.document)?;
        println!(
            "Retrieved post #{} \"{}\"",
            mapping.document.header.id, post.title
        );
    }

The above snippet queries the Database for all documents in the BlogPost Collection that emitted a Key of Some("Rust").

If you're using a SerializedCollection, you can use query_with_collection_docs() to have the deserialization done automatically for you:

    let rust_posts = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(String::from("Rust")))
        .query_with_collection_docs()
        .await?;
    for mapping in &rust_posts {
        println!(
            "Retrieved post #{} \"{}\"",
            mapping.document.header.id, mapping.document.contents.title
        );
    }

Reduce

The second function to learn about is the reduce() function. It is responsible for turning an array of Key/Value pairs into a single Value. In some cases, BonsaiDb might need to call reduce() with values that have already been reduced one time. If this is the case, rereduce is set to true.

In this example, we're using the built-in Iterator::sum() function to turn our Value of 1_u32 into a single u32 representing the total number of documents.

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(String::from("Rust")))
        .reduce()
        .await?;
    assert_eq!(rust_post_count, 2);

Changing an exising view

If you have data stored in a view, but want to update the view to store data differently, implement ViewSchema::version() and return a unique number. When BonsaiDb checks the view's integrity, it will notice that there is a version mis-match and automatically re-index the view.

There is no mechanism to access the data until this operation is complete.

Understanding Re-reduce

Let's examine this data set:

Document IDBlogPost Category
1Some("Rust")
2Some("Rust")
3Some("Cooking")
4None

When updating views, each view entry is reduced and the value is cached. These are the view entries:

View Entry IDReduced Value
Some("Rust")2
Some("Cooking")1
None1

When a reduce query is issued for a single key, the value can be returned without further processing. But, if the reduce query matches multiple keys, the View's reduce() function will be called with the already reduced values with rereduce set to true. For example, retrieving the total count of blog posts:

    let total_post_count = db.view::<BlogPostsByCategory>().reduce().await?;
    assert_eq!(total_post_count, 3);

Once BonsaiDb has gathered each of the key's reduced values, it needs to further reduce that list into a single value. To accomplish this, the View's reduce() function to be invoked with rereduce set to true, and with mappings containing:

KeyValue
Some("Rust")2
Some("Cooking")1
None1

This produces a final value of 4.

How does BonsaiDb make this efficient?

When saving Documents, BonsaiDb does not immediately update related views. It instead notes what documents have been updated since the last time the View was indexed.

When a View is accessed, the queries include an AccessPolicy. If you aren't overriding it, UpdateBefore is used. This means that when the query is evaluated, BonsaiDb will first check if the index is out of date due to any updated data. If it is, it will update the View before evaluating the query.

If you're wanting to get results quickly and are willing to accept data that might not be updated, the access policies UpdateAfter and NoUpdate can be used depending on your needs.

If multiple simulataneous queries are being evaluted for the same View and the View is outdated, BonsaiDb ensures that only a single view indexer will execute while both queries wait for it to complete.

Using arbitrary types as a View Key

In our previous example, we used String for the Key type. The reason is important: Keys must be sortable by our underlying storage engine, which means special care must be taken. Most serialization types do not guarantee binary sort order. Instead, BonsaiDb exposes the Key trait. On that documentation page, you can see that BonsaiDb implements Key for many built-in types.

Using an enum as a View Key

The easiest way to expose an enum is to derive num_traits::FromPrimitive and num_traits::ToPrimitive using num-derive, and add an impl EnumKey line:

#[derive(
    Serialize, Deserialize, Debug, num_derive::FromPrimitive, num_derive::ToPrimitive, Clone,
)]
pub enum Category {
    Rust,
    Cooking,
}

impl EnumKey for Category {}

The View code remains unchanged, although the associated Key type can now be set to Option<Category>. The queries can now use the enum instead of a String:

    let rust_post_count = db
        .view::<BlogPostsByCategory>()
        .with_key(Some(Category::Rust))
        .reduce()
        .await?;

BonsaiDb will convert the enum to a u64 and use that value as the Key. A u64 was chosen to ensure fairly wide compatibility even with some extreme usages of bitmasks. If you wish to customize this behavior, you can implement Key directly.

Implementing the Key trait

The Key trait declares two functions: as_ord_bytes() and from_ord_bytes. The intention is to convert the type to bytes using a network byte order for numerical types, and for non-numerical types, the bytes need to be stored in binary-sortable order.

Here is how BonsaiDb implements Key for EnumKey:

impl<'a, T> Key<'a> for T
where
    T: EnumKey,
{
    type Error = std::io::Error;
    const LENGTH: Option<usize> = None;

    fn as_ord_bytes(&'a self) -> Result<Cow<'a, [u8]>, Self::Error> {
        let integer = self
            .to_u64()
            .map(Unsigned::from)
            .ok_or_else(|| std::io::Error::new(ErrorKind::InvalidData, IncorrectByteLength))?;
        Ok(Cow::Owned(integer.to_variable_vec()?))
    }

    fn from_ord_bytes(bytes: &'a [u8]) -> Result<Self, Self::Error> {
        let primitive = u64::decode_variable(bytes)?;
        Self::from_u64(primitive)
            .ok_or_else(|| std::io::Error::new(ErrorKind::InvalidData, UnknownEnumVariant))
    }
}

By implementing Key you can take full control of converting your view keys.

Schema

A Schema is a group of one or more Collections. A Schema can be instantiated as a Database. The Schema describes how a set of data behaves, and a Database is a set of data on-disk.

Database

A Database is a set of stored data. Each Database is described by a Schema. Unlike the other concepts, this concept corresponds to multiple types:

All of these types implement the Connection trait.

Storage

The StorageConnection trait allows interacting with a BonsaiDb multi-database storage instance.

There are three implementations of the StorageConnection trait:

  • Storage: A local, file-based server implementation with no networking capabilities.
  • Server: A networked server implementation, written using Storage. This server supports QUIC- and WebSocket-based protocols. The QUIC protocol is preferred, but it uses UDP which many load balancers don't support. If you're exposing BonsaiDb behind a load balancer, WebSockets may be the only option depending on your host's capabilities.
  • Client: A network client implementation that connects to a Server.

PubSub

The Publish/Subscribe pattern enables developers to design systems that produce and receive messages. It is implemented for BonsaiDb through the PubSub and Subscriber traits.

A common example of what PubSub enables is implementing a simple chat system. Each chat participant can subscribe to messages on the chat topic, and when any participant publishes a chat message, all subscribers will receive a copy of that message.

A working example of PubSub is available at examples/basic-local/examples/pubsub.rs.

Use cases of BonsaiDb

Single database model (No networking)

This use case is most similar to utilizing SQLite for your database. In this mode, BonsaiDb directly interacts with files on your disk to provide your database. Unlike other file-based databases, however, it's easy to migrate to any of these scenarios from this starting position:

graph LR
  code{{Rust Code}}
  local[(bonsaidb-local::Database)]
  code <--> local

A working example of how to use a local database can be found at examples/basic-local/examples/basic-local.rs.

Multi-database model (No networking)

This model is most similar to using multiple SQLite databases. In this mode, you interact with a Storage that you spawn within your code.

graph LR
  code{{Rust Code}}
  local[(bonsaidb-local::Storage)]
  code <--> server
  server <--> local

If you look at the source behind Database::open_local, you'll see that the single-database model is using Storage under the hood.

Server model (QUIC or WebSockets)

This model is most similar to using other document databases, like CouchDB or MongoDB. In this mode, you interact with a Client that connects via either QUIC or WebSockets with a server. From the server code's perspective, this model is the same as the multi-database model, except that the server is listening for and responding to network traffic.

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server[[bonsaidb-server]]
  local[(bonsaidb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server

A working example of this model can be found at examples/basic-server/examples/basic-server.rs. When writing client/server applications that utilize BonsaiDb, you can have the BonsaiDb server running withing your server application. This means that your server still has the ability not use networking to interact with BonsaiDb. Regardless of if you run any other server code, your BonsaiDb server will be accessible through a Client over the network.

API Platform model (QUIC or WebSockets)

If you're finding yourself developing an API for your application, and all of the consumers of this API are already connected to BonsaiDb, you may want to take advantage of the custom api functionality of the server:

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server[[bonsaidb-server]]
  backend[[Backend]]
  local[(bonsaidb-local)]
  client-code <--> client
  client <-. network .-> server
  server <--> local
  server-code <--> server
  server-code <--> backend
  backend <--> server

The BonsaiDb CustomServer type accepts one generic parameter that implements the Backend trait. This trait is used to customize the server in many ways, but one of the associated types is a CustomApi implementor.

See this page for an overview of how to set up a custom api server.

Coming Later: Cluster model

When you're at the stage of scaling beyond a single server, you will be able to upgrade your server to a cluster using the hypothetical bonsaidb-cluster crate. The clustering model is still being designed, but the goal is something similar to:

graph LR
  client-code{{Rust Client Code}}
  server-code{{Rust Server Code}}
  client[[bonsaidb-client]]
  server1[[server 1]]
  server2[[server 2]]
  server3[[server 3]]
  cluster[[bonsaidb-cluster]]
  client-code <--> client
  client <-. network .-> cluster
  server-code <--> cluster
  cluster <--> server1
  cluster <--> server2
  cluster <--> server3
  server1 <--> server2
  server2 <--> server3
  server1 <--> server3

In this model, the local storage element is hidden; Each server has its own storage. This model is very similar from the viewpoint of your server and client code -- the primary difference is that the server-side connection is being established using the cluster crate. From the client's perspective, the cluster behaves as a single entity -- sending a request to any server node will result in the same result within the cluster.

All features of BonsaiDb will be designed to work in cluster mode seamlessly. PubSub will ensure that subscribers will receive messages regardless of which server they're connected to.

Custom Api Server

The CustomApi trait defines three associated types, Request, Response, and Error. A backend "dispatches" Requests and expects a Result<Response, Error> in return.

All code on this page comes from this example: examples/basic-server/examples/custom-api.rs.

This example defines a Request and a Response type, but uses BonsaiDb's Infallible type for the error:

#[derive(Serialize, Deserialize, Actionable, Debug)]
#[actionable(actionable = bonsaidb::core::actionable)]
pub enum Request {
    #[actionable(protection = "none")]
    Ping,
    #[actionable(protection = "simple")]
    DoSomethingSimple { some_argument: u32 },
    #[actionable(protection = "custom")]
    DoSomethingCustom { some_argument: u32 },
}

#[derive(Serialize, Deserialize, Debug, Clone)]
pub enum Response {
    Pong,
    DidSomething,
}

impl CustomApi for ExampleApi {
    type Request = Request;
    type Response = Response;
    type Error = Infallible;
}

To implement the server, we must first implement a custom Backend that ties the server to the CustomApi. We also must define a CustomApiDispatcher, which gives an opportunity for the dispatcher to gain access to the ConnectedClient and/or `CustomServer instances if they are needed to handle requests.

Finally, either Dispatcher must be implemented manually or actionable can be used to derive an implementation that uses individual traits to handle each request. The example uses actionable:

impl Backend for ExampleBackend {
    type CustomApi = ExampleApi;
    type CustomApiDispatcher = ExampleDispatcher;
    type ClientData = ();
}

/// Dispatches Requests and returns Responses.
#[derive(Debug, Dispatcher)]
#[dispatcher(input = Request, actionable = bonsaidb::core::actionable)]
pub struct ExampleDispatcher {
    // While this example doesn't use the server reference, this is how a custom
    // API can gain access to the running server to perform database operations
    // within the handlers. The `ConnectedClient` can also be cloned and stored
    // in the dispatcher if handlers need to interact with clients outside of a
    // simple Request/Response exchange.
    _server: CustomServer<ExampleBackend>,
}

impl CustomApiDispatcher<ExampleBackend> for ExampleDispatcher {
    fn new(
        server: &CustomServer<ExampleBackend>,
        _client: &ConnectedClient<ExampleBackend>,
    ) -> Self {
        Self {
            _server: server.clone(),
        }
    }
}

#[async_trait]
impl RequestDispatcher for ExampleDispatcher {
    type Output = Response;
    type Error = BackendError<Infallible>;
}

/// The Request::Ping variant has `#[actionable(protection = "none")]`, which
/// causes `PingHandler` to be generated with a single method and no implicit
/// permission handling.
#[async_trait]
impl PingHandler for ExampleDispatcher {
    async fn handle(
        &self,
        _permissions: &Permissions,
    ) -> Result<Response, BackendError<Infallible>> {
        Ok(Response::Pong)
    }
}

Finally, the client can issue the API call and receive the response, without needing any extra steps to serialize. This works regardless of whether the client is connected via QUIC or WebSockets.

async fn ping_the_server(
    client: &Client<ExampleApi>,
    client_name: &str,
) -> Result<(), bonsaidb::core::Error> {
    match client.send_api_request(Request::Ping).await {
        Ok(Response::Pong) => {
            println!("Received Pong from server on {}", client_name);
        }
        other => println!(
            "Unexpected response from API call on {}: {:?}",
            client_name, other
        ),
    }

    Ok(())
}

Permissions

One of the strengths of using BonsaiDb's custom api functionality is the ability to tap into the permissions handling that BonsaiDb uses. The Ping request was defined with protection = "none" which skips all permission validation. However, DoSomethingSimple uses the "simple" protection model, and DoSomethingCustom uses the "custom" protection model. The comments in the example below should help explain the rationale:

/// The permissible actions that can be granted for this example api.
#[derive(Debug, Action)]
#[action(actionable = bonsaidb::core::actionable)]
pub enum ExampleActions {
    DoSomethingSimple,
    DoSomethingCustom,
}

/// With `protection = "simple"`, `actionable` will generate a trait that allows
/// you to return a `ResourceName` and an `Action`, and the handler will
/// automatically confirm that the connected user has been granted the ability
/// to perform `Action` against `ResourceName`.
#[async_trait]
impl DoSomethingSimpleHandler for ExampleDispatcher {
    type Action = ExampleActions;

    async fn resource_name<'a>(
        &'a self,
        _some_argument: &'a u32,
    ) -> Result<ResourceName<'a>, BackendError<Infallible>> {
        Ok(ResourceName::named("example"))
    }

    fn action() -> Self::Action {
        ExampleActions::DoSomethingSimple
    }

    async fn handle_protected(
        &self,
        _permissions: &Permissions,
        _some_argument: u32,
    ) -> Result<Response, BackendError<Infallible>> {
        // The permissions have already been checked.
        Ok(Response::DidSomething)
    }
}

/// With `protection = "custom"`, `actionable` will generate a trait with two
/// functions: one to verify the permissions are valid, and one to do the
/// protected action. This is useful if there are multiple actions or resource
/// names that need to be checked, or if permissions change based on the
/// arguments passed.
#[async_trait]
impl DoSomethingCustomHandler for ExampleDispatcher {
    async fn verify_permissions(
        &self,
        permissions: &Permissions,
        some_argument: &u32,
    ) -> Result<(), BackendError<Infallible>> {
        if *some_argument == 42 {
            Ok(())
        } else {
            permissions.check(
                ResourceName::named("example"),
                &ExampleActions::DoSomethingCustom,
            )?;

            Ok(())
        }
    }

    async fn handle_protected(
        &self,
        _permissions: &Permissions,
        _some_argument: u32,
    ) -> Result<Response, BackendError<Infallible>> {
        // `verify_permissions` has already been executed, so no permissions
        // logic needs to live here.
        Ok(Response::DidSomething)
    }
}

This example uses authenticated_permissions to grant access to ExampleAction::DoSomethingSimple and ExampleAction::DoSomethingCustom to all users who have logged in:

    let server = CustomServer::<ExampleBackend>::open(
        ServerConfiguration::new("custom-api.bonsaidb")
            .default_permissions(Permissions::from(
                Statement::for_any()
                    .allowing(&BonsaiAction::Server(ServerAction::Connect))
                    .allowing(&BonsaiAction::Server(ServerAction::Authenticate(
                        AuthenticationMethod::PasswordHash,
                    ))),
            ))
            .authenticated_permissions(Permissions::from(
                Statement::for_any()
                    .allowing(&ExampleActions::DoSomethingSimple)
                    .allowing(&ExampleActions::DoSomethingCustom),
            )),
    )
    .await?;

For more information on managing permissions, see Administration/Permissions

Overview

BonsaiDb aims to offer the majority of its functionality in local operation. The networked server adds some functionality on top of the local version, but its main function is to add the ability to use networking to talk to the database.

Because of this model, it makes it easy to transition a local database to a networked database server. Start with whatever model fits your needs today, and when your neeeds change, BonsaiDb will adapt.

When to use the Local Integration

  • You're going to databases from one process at a time. BonsaiDb is designed for concurrency and can scale with the capabilities of the hardware. However, the underlying storage layer that BonsaiDb is built upon, sled, does not support multiple processes writing its data simultaneously. If you need to access the database from multiple processes, the server integration is what you should use. While it doesn't offer IPC communication today, a pull-request would be accepted to that added that functionality (along with the corresponding unit tests).
  • You have no public API/PubSub/access needs or have implemented those with another stack.

When to use the Server Integration

  • You need to access databases from more than one process or machine.
  • You are OK with downtime due to loss of service when the single server is offline. If you need to have a highly-available database, you should use the Cluster Integration (Coming Soon).
  • Your database load can be met with a single machine. If you have enough load that you need to share the processing power of multiple servers, you should use the Cluster Integration (Coming Soon)

Coming Soon: When to use the Cluster Integration

  • You need to access databases from more than one machine.
  • You need a highly-available setup.
  • You need/want to split load between multiple machines.

Integrating BonsaiDb Locally

BonsaiDb supports multiple databases and multiple schemas. However, for many applications, you only need a single database.

If you're only wanting a single database, the setup is straightforward: (from examples/basic-local/examples/basic-local.rs)

let db = Database::<Message>::open(
    StorageConfiguration::new("basic.bonsaidb")
).await?;

Under the hood, BonsaiDb is creating a multi-database Storage with a local Database named default for you. If you need to switch to a multi-database model, you can open the storage and access the default database: (adapted from examples/basic-local/examples/basic-local.rs)

let storage = Storage::open(
    Configuration::new("basic.bonsaidb")
        .with_schema::<Message>()?
).await?;
let db = storage.database::<Message>("default").await?;

You can register multiple schemas so that databases can be purpose-built.

Common Traits

To help your code transition between different modes of accessing BonsaiDb, you can use these common traits to make your methods accept any style of BonsaiDb access.

For example, examples/basic-local/examples/basic-local.rs uses this helper method to insert a record:

async fn insert_a_message<C: Connection>(
    connection: &C,
    value: &str,
) -> Result<(), bonsaidb::core::Error> {
    Message {
        contents: String::from(value),
        timestamp: SystemTime::now(),
    }
    .push_into(connection)
    .await?;
    Ok(())
}

Integrating the networked BonsaiDb Server

To access BonsaiDb over the network, you're going to be writing two pieces of code: the server code and the client code.

Your BonsaiDb Server

The first step is to create a Server, which uses local Storage under the hood. This means that if you're already using BonsaiDb in local mode, you can swap your usage of Storage with Server in your server code without running your database through any tools. Here's the setup code from basic-server/examples/basic-server.rs

    let server = Server::open(
        ServerConfiguration::new("server-data.bonsaidb")
            .default_permissions(DefaultPermissions::AllowAll)
            .with_schema::<Shape>()?,
    )
    .await?;
    if server.certificate_chain().await.is_err() {
        server.install_self_signed_certificate(true).await?;
    }
    let certificate = server
        .certificate_chain()
        .await?
        .into_end_entity_certificate();
    server.create_database::<Shape>("my-database", true).await?;

Once you have a server initialized, calling listen_on will begin listening for connections on the port specified. This uses the preferred native protocol which uses UDP. If you find that UDP is not working for your setup or want to put BonsaiDb behind a load balancer that doesn't support UDP, you can enable WebSocket support and call listen_for_websockets_on.

You can call both, but since these functions don't return until the server is shut down, you should spawn them instead:

let task_server = server.clone();
tokio::spawn(async move {
    task_server.listen_on(5645).await
});
let server = server.clone();
tokio::spawn(async move {
    task_server.listen_for_websockets_on("localhost:8080", false).await
});

If you're not running any of your own code on the server, and you're only using one listening method, you can just await the listen method of your choice in your server's main. This code example configures BonsaiDb on UDP port 5645, but this is not an officially registered port.

From the Client

The Client can support both the native protocol and WebSockets. It determines which protocol to use based on the scheme in the URL:

  • bonsaidb://host:port will connect using the native BonsaiDb protocol.
  • ws://host:port will connect using WebSockets.

Here's how to connect, from examples/basic-server/examples/basic-server.rs:

Client::new(
    Url::parse("bonsaidb://localhost:5645")?,
    Some(certificate),
)
.await?

This is using a pinned certificate to connect. Other methods are supported, but better certificate management is coming soon.

Common Traits

Integrating into a BonsaiDb Cluster

Coming Soon.

The goals of this feature are to make clustering simple. We hope to provide an experience that allows someone who is operating a networked server to desire two types of clusters:

One-leader mode

When setting up a cluster initially, you will begin with one-leader mode. In this mode, you can add as many nodes to the cluster as you wish, but only one node will be processing all of the data updates. All nodes can handle requests, but requests that can't be served locally will be forwarded to the leader. This allows for the use of read-replicas to alleviate load in some read-heavy situations.

Another benefit of this mode are that it supports a two-node configuration. If you're scaling your app and need a reliable backup for quicker disaster recovery, you can operate a read replica and manually failover when the situation arises.

If you decide to allow automatic failover in this mode, there is a chance for data loss, as the leader does not wait for read-replicas to synchronize data. Any transactions that committed and were not synchronized before the outage occurred would not be on the other servers. Thus, this mode is not intended for high-availability configurations, although some users may elect to use it in such a configuration knowing these limitations.

Quorum mode

Once you have a cluster with at least 3 nodes, you can switch the cluster into quorum mode. For any given N nodes, all requests must reach an agreed response by N / 2 + 1 members. For example, in a cluster of 3 nodes, there must be 2 successful responses before a client can receive a response to its request.

In quorum mode, your data is divided into shards and those shards replicated throughout the cluster onto at least 3 nodes (configurable). Initially, with just 3 nodes available, the only benefits are having a highly-available cluster with no data loss during when a single node goes down.

As you add more nodes to your cluster, however, you can re-balance your databases to move shards. The author of BonsaiDb did not enjoy this process in CouchDB when he had to do it and aims to make these tools easy and effortless to use. Ideally, there would be a low-maintenance mode that would allow the cluster to re-shard itself authomatically during allowed maintenance periods, ensuring data is distributed more evenly amongst the cluster.

Additional long-term dreams of quorum mode include the ability to customize node selection criteria on a per-database basis. The practical use of node selection is to ensure that at least 3 unique nodes are picked for each shard. However, allowing custom logic to evaluate which nodes should be selected for any database would allow ultimate flexibility. For example, if you have a globally deployed application, and you have some data that is geographically specific, you could locate each region's database on nodes within those locations' data centers.

When?

Clustering is an important part of the design of Cosmic Verge. As such, it is a priority for us to work on. But, the overall game is a very large project, so we hesitate to make any promises on timelines.

Connection

The Connection trait contains functions for interacting with collections in a database. This trait is implemented by the Database types in each crate:

Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

This is an async trait, which unfortunately yields messy documentation due to the lifetimes.

StorageConnection

The StorageConnection trait contains functions for interacting with BonsaiDb's multi-database storage. This trait is implemented by these types:

Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

This is an async trait, which unfortunately yields messy documentation.

PubSub Trait

The PubSub trait contains functions for using PubSub in BonsaiDb. This trait is implemented by the Database types in each crate:

Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

This is an async trait, which unfortunately yields messy documentation.

Key-Value Trait

The KeyValue trait contains functions for interacting the atomic key-value store. The key-value store provides high-performance atomic operations without ACID compliance. Once the data is persisted to disk, it holds the same guarantees as all of BonsaiDb, but this feature is designed for high throughput and does not wait to persist to disk before reporting success to the client. This trait is implemented by the Database types in each crate:

Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.

This is an async trait, which unfortunately yields messy documentation.

Configuration

BonsaiDb attempts to have reasonable default configuration options, but it's important to browse the available options to ensure there aren't options that might help your particular needs.

Storage Configuration

The StorageConfiguration structure is used to open a local-only database. The ServerConfiguration struct contains an instance of StorageConfiguration, and all configuration optionsl are available on it.

Vault Key Storage

By default, BonsaiDb sets vault_key_storage to a file stored within the database folder. This is incredibly insecure and should not be used outside of testing.

For secure encryption, it is important to store the vault keys in a location that is separate from the database. If the keys are on the same harware as the database, anyone with access to the disk will be able to decrypt the stored data.

If you have more than one server, you can still use LocalVaultKeyStorage in conjunction with a mounted network share for reasonable security practices -- assuming the network share itself is properly secured.

If you have an S3-compatible storage service available, you can use bonsaidb::keystorage::s3 to store the vault keys with that service.

Note that by storing your keys remotely, your BonsaiDb database will not be able to be opened unless the keys are able to be read.

Vault Key Storage can also be set using Builder::vault_key_storage.

Default Encryption Key

By setting default_encryption_key to a key, all data will be encrypted when written to the disk.

If default_encryption_key is None, encryption will still be performed for collections that return a key from Collection::encryption_key().

Can also be set using Builder::default_encryption_key.

Tasks: Worker Count

The tasks.worker_count setting controls the number of worker tasks that are spawned to process background tasks.

Can also be set using Builder::tasks_worker_count.

Views: Check Integrity on Open

When views.check_integrity_on_open is true, all views in all databases will be checked on startup for integrity. If this value is false, the integrity of the view will not be checked until it is accessed for the first time.

By default, BonsaiDb delays checking a view's integrity until its accessed for the first time. it may, however, be preferred to have a higher startup time to ensure consistent response times once the server is running after a restart of the server.

Can also be set using Builder::check_view_integrity_on_open.

Key-Value Persistence

The Key-Value store is designed to be a lightweight, atomic data store that is suitable for caching data, tracking metrics, or other situations where a Collection might be overkill.

By default, BonsaiDb persists Key-Value store changes to disk immediately. For light usage, this will not be noticable, and it ensures that no data will ever be lost.

If you're willing to accept potentially losing recent writes, key_value_persistence can be configured to lazily commit changes to disk. The documentation for KeyValuePersistence contains examples as well as an explanation of how the rules are evaluated.

Key-Value Persistence can also be set using Builder::key_value_persistence.

Server Configuration

The ServerConfiguration structure is used to open a BonsaiDb server. Being built atop the local storage engine, this structure exposes an instance of StorageConfiguration, allowing full customization.

Server Name

The server_name setting is for the primary DNS name of the server. The server's TLS certificate should be valid for the server's name.

When using ACME, this setting controls the primary certificate requested.

Can also be set using a builder-style method.

Client Simultaneous Request Limit

BonsaiDb's networking protocols support multiple requests to be sent before any responses have been received, sometimes called pipelining. Without a limit, a single malicious client could send a large number of load-inducing requests and cause reliability of service issues for other clients.

By limiting each connection's maximum ability to a reasonable number, it allows clients to take advantage of pipelining without allowing any one client to saturate the server with requests.

This limit is set using the client_simultaneous_request_limit field or builder-style method.

Request Worker Count

The request_workers configuration controls the number of worker tasks that process incoming requests from connected clients. It can also be set via a builder-style method.

Default Permissions and Authenticated Permissions

When first connecting to a server, the client is unauthenticated and is granted the permissions defined by default_permissions. Once a connected client has authenticated, the client will be granted authenticated_permissions in addition to whatever permissions already granted by the authenticated role.

By default, both default_permissions and authenticated_permissions contain no granted permissions. This means that by default, no connections are allowed to a server, as the connection hasn't been gramted BonsaiAction::Server(ServerAction::Connect() ).

ACME Configuration (LetsEncrypt)

ACME has two configurable options, a contact email and the ACME directory.

ACME Contact Email

The contact email is submitted to the ACME directory as part of requesting a TLS certificate. It is optional for the LetsEncrypt directories.

A valid value for this field begins with mailto:.

The contact email can be set using acme.contact_email or the builder-style method.

ACME Directory

By default, BonsaiDb uses the production LetsEncrypt directory, but any ACME directory can be specified.

The directory can be set using acme.directory or the builder-style method.

Permissions

BonsaiDb uses role-based access control (RBAC). In short, permissions are granted through statements within permission groups. Users are able to log in and receive permissions that were granted via permission groups or roles.

This section has two subsections:

While the most common use case will be granting permissions to act upon BonsaiDb itself, the permissions system is designed to be generic enough that it can be used as the application's permission system if desired.

By default, no actions are allowed.

Currently, permissions are only applied to connections over a network. In the future, permissions will be able to be applied even on local connections.

Permission Statements

A Statement grants permissions to execute Actions on ResourceNames.

Actions and Resources

ResourceNames are simply namespaced Identifiers. An example could be: "bonsaidb".*."khonsulabs-admin.users".1. Each segment can be a string, an integer, or a wildcard (*).

In BonsaiDb, nearly everything has a resource name. The example above refers to a document with ID 1 in the khonsulabs-admin.users collection in any database. The bonsaidb::core::permissions::bonsai module contains functions to create properly formatted ResourceNames.

Also within the same module are the built-in Actions. The base enum for all actions used within BonsaiDb is BonsaiAction Below is an overview of the resource names and actions by category.

Server

The ServerAction enum contains the actions that are related to StorageConnection. For APIs that accept a database name parameter, the resource name will be database_resource_name(database). For all other actions, the resource name is bonsaidb_resource_name().

For actions that operate upon users (e.g., creating a user), the resource name is user_resource_name(username).

At-rest Encryption

Access to encrypted information can be controlled by limiting access to the encryption key used. Currently, BonsaiDb only has support for a shared master key, but in the future additional keys will be able to be created. Because Encrypt and Decrypt are separate actions, access to read and write can be controlled independently.

The resource name for an encryption key is encryption_key_resource_name(key_id).

Database

The DatabaseAction enum contains the actions that are related to a specific database. Actions that act on the database directly will use the resource name database_resource_name(database).

For Collections, there are three resource names used. For actions that operate on the collection directly, the resource name is collection_resource_name(database, collection). For actions that operate on a document, the resource name is document_resource_name(database, collection, id). Finally, for actions that operate on a View, the resource name is view_resource_name(database, view).

For actions that operate upon the key-value entry, the resource name is keyvalue_key_resource_name(database, namespace, key).

For actions that operate on a PubSub topic, the resource name is pubsub_topic_resource_name(database, topic).

Statement Examples

Coming Soon.

Users, Groups, and Roles

The most common flow that a database administrator needs to support is granting a user the ability to take specific actions on specific resources. To accomplish this, a PermissionGroup must be created containing the permission statements, covered in the previous section, that you wish to apply.

PermissionGroups can be assigned directly to users by adding the group ID to their User document.

At first glance, Roles may appear somewhat redundant. One or more PermissionGroups can be assigned to a role, and roles can be assigned to a user. Why would you want to use roles at all?

The general advice the authors of BonsaiDb suggest is to use groups for limited amounts of functionality, keeping each group's list of statements concise and easy to understand. Then, create roles that combine groups of functionality in meaningful ways. One meaningful way could be creating roles based on job titles inside of a company. In theory, a person's job defines what they do within the company.

In practice, permissions are never as clean as one would hope, which is why BonsaiDb allows assigning groups and roles to users directly. Roles should be used as much as possible, but sometimes assigning a group directly is just needed. For example, imagine the CEO telling you, "I know Bob is just a sales guy, but he needs to be able to update this record. I trust him more than the other sales people. Just make it happen." As the database administrator, you can decide whether to introduce a new role or just temporarily assign an extra group to this one user.

At-Rest Encryption

BonsaiDb offers at-rest encryption. An overview of how it works is available in the bonsaidb::local::vault module.

Enabling at-rest encryption by default

When opening your BonsaiDb instance, there is a configuration option default_encryption_key. Once this is set, all new data written that supports being encrypted will be encrypted at-rest.

let storage = Storage::open(
    StorageConfiguration::new(&directory)
        .vault_key_storage(vault_key_storage)
        .default_encryption_key(KeyId::Master)
)
.await?;

Enabling at-rest encryption on a per-collection basis

Collection::encryption_key() can be overridden on a per-Collection basis. If a collection requests encryption but the feature is disabled, an error will be generated.

To enable a collection to be encrypted when the feature is enabled, only return a key when ENCRYPTION_ENABLED is true.