BonsaiDb User's Guide
BonsaiDb is an ACID-compliant, document-database written in Rust. Its goal is to be a general-purpose database that aims to simplify development and deployment by providing reliable building blocks that are lightweight enough for hobby projects running with minimal resources, but scalable for when your hobby project becomes a deployed product.
This user's guide aims to provide a guided walkthrough for users to understand how BonsaiDb works. This guide is meant to be supplemental to the documentation. If you learn best by exploring examples, many are available in /examples
in the repository. If, however, you learn best by taking a guided tour of how something works, this guide is specifically for you.
If you have any feedback on this guide, please file an issue, and we will try to address any issues or shortcomings.
Thank you for exploring BonsaiDb.
Concepts
This is a list of common concepts that will be used throughout this book as well as the documentation.
Document
A Document is a single piece of stored data. Each document is stored within a Collection
, and has a unique ID within that Collection. There are two document types: OwnedDocument
and BorrowedDocument
. The View::map()
function takes a BorrowedDocument
, but nearly every other API utilizes OwnedDocument
.
When a document is updated, BonsaiDb will check that the revision information passed matches the currently stored information. If not, a conflict error will be returned. This simple check ensures that if two writers try to update the document simultaneously, one will succeed and the other will receive an error.
Serializable Collections
BonsaiDb provides the SerializedCollection
trait, which allows automatic serialization and deserialization in many sitautions. When using Document::contents()
function, the document is serialized and deserialized by the format returned from SerializedCollection::format()
.
The CollectionDocument<T>
type provides convenience methods of interacting with serializable documents.
Default serialization of Serde-compatible types
BonsaiDb provides a convenience trait for Serde-compatible data types: DefaultSerialization
. This empty trait can be implemented on any collection to have BonsaiDb provide its preferred serialization format, Pot.
Raw Collections
If you would prefer to manually manage the data stored inside of a Document, you can directly manage the contents
field. BonsaiDb will not interact with the contents
of a Document. Only code that you write will parse or update the stored data.
Collection
A Collection is a group of Documents and associated functionality. Collections are stored on-disk using ACID-compliant, transactional storage, ensuring your data is protected in the event of a sudden power failure or other unfortunate event.
The goal of a Collection is to encapsulate the logic for a set of data in such a way that Collections could be designed to be shared and reused in multiple Schemas or applications.
Each Collection must have a unique
CollectionName
.
To help prevent naming collisions, an authority
can be specified which
provides a level of namespacing.
A Collection can contain one or more Views.
View
A View is a map/reduce-powered method of quickly accessing information inside of a Collection. A View can only belong to one Collection.
Views define two important associated types: a Key type and a Value type. You can think of these as the equivalent entries in a map/dictionary-like collection that supports more than one entry for each Key. The Key is used to filter the View's results, and the Value is used by your application or the reduce()
function.
Views are a powerful, yet abstract concept. Let's look at a concrete example: blog posts with categories.
#[derive(Serialize, Deserialize, Debug, Collection)]
#[collection(name = "blog-post", views = [BlogPostsByCategory])]
pub struct BlogPost {
pub title: String,
pub body: String,
pub category: Option<String>,
}
Let's insert this data for these examples:
BlogPost {
title: String::from("New version of BonsaiDb released"),
body: String::from("..."),
category: Some(String::from("Rust")),
}
.push_into(&db)
.await?;
BlogPost {
title: String::from("New Rust version released"),
body: String::from("..."),
category: Some(String::from("Rust")),
}
.push_into(&db)
.await?;
BlogPost {
title: String::from("Check out this great cinnamon roll recipe"),
body: String::from("..."),
category: Some(String::from("Cooking")),
}
.push_into(&db)
.await?;
All examples on this page are available in their full form in the repository at book/book-examples/tests.
While category
should be an enum, let's first explore using String
and upgrade to an enum at the end (it requires one additional step). Let's implement a View that will allow users to find blog posts by their category as well as count the number of posts in each category.
#[derive(Debug, Clone, View)]
#[view(collection = BlogPost, key = Option<String>, value = u32, name = "by-category")]
pub struct BlogPostsByCategory;
impl ViewSchema for BlogPostsByCategory {
type View = Self;
fn map(&self, document: &BorrowedDocument<'_>) -> ViewMapResult<Self::View> {
let post = document.contents::<BlogPost>()?;
Ok(document.emit_key_and_value(post.category, 1))
}
fn reduce(
&self,
mappings: &[ViewMappedValue<Self::View>],
_rereduce: bool,
) -> ReduceResult<Self::View> {
Ok(mappings.iter().map(|mapping| mapping.value).sum())
}
}
The two traits being implemented are View and
ViewSchema. These traits are designed to allow keeping the
View
implementation in a shared code library that is used by both client-side
and server-side code, while keeping the ViewSchema
implementation in the
server executable only.
Views for SerializedCollection
For users who are using SerializedCollection
, CollectionViewSchema
can be implemented instead of ViewSchema
. The only difference between the two is that the map()
function takes a CollectionDocument
instead of a BorrowedDocument
.
Value Serialization
For views to function, the Value type must able to be serialized and deserialized from storage. To accomplish this, all views must implement the SerializedView
trait. For Serde-compatible data structures, DefaultSerializedView
is an empty trait that can be implemented instead to provide the default serialization that BonsaiDb recommends.
Map
The first line of the map
function calls Document::contents()
to deserialize the stored BlogPost
. The second line returns an emitted Key and Value -- in our case a clone of the post's category and the value 1_u32
. With the map function, we're able to use query()
and query_with_docs()
:
let rust_posts = db
.view::<BlogPostsByCategory>()
.with_key(Some(String::from("Rust")))
.query_with_docs()
.await?;
for mapping in &rust_posts {
let post = mapping.document.contents::<BlogPost>()?;
println!(
"Retrieved post #{} \"{}\"",
mapping.document.header.id, post.title
);
}
The above snippet queries the Database for all documents in the BlogPost
Collection that emitted a Key of Some("Rust")
.
If you're using a SerializedCollection
, you can use query_with_collection_docs()
to have the deserialization done automatically for you:
let rust_posts = db
.view::<BlogPostsByCategory>()
.with_key(Some(String::from("Rust")))
.query_with_collection_docs()
.await?;
for mapping in &rust_posts {
println!(
"Retrieved post #{} \"{}\"",
mapping.document.header.id, mapping.document.contents.title
);
}
Reduce
The second function to learn about is the reduce()
function. It is responsible for turning an array of Key/Value pairs into a single Value. In some cases, BonsaiDb might need to call reduce()
with values that have already been reduced one time. If this is the case, rereduce
is set to true.
In this example, we're using the built-in Iterator::sum()
function to turn our Value of 1_u32
into a single u32
representing the total number of documents.
let rust_post_count = db
.view::<BlogPostsByCategory>()
.with_key(Some(String::from("Rust")))
.reduce()
.await?;
assert_eq!(rust_post_count, 2);
Changing an exising view
If you have data stored in a view, but want to update the view to store data
differently, implement ViewSchema::version()
and return
a unique number. When BonsaiDb checks the view's integrity, it will notice that
there is a version mis-match and automatically re-index the view.
There is no mechanism to access the data until this operation is complete.
Understanding Re-reduce
Let's examine this data set:
Document ID | BlogPost Category |
---|---|
1 | Some("Rust") |
2 | Some("Rust") |
3 | Some("Cooking") |
4 | None |
When updating views, each view entry is reduced and the value is cached. These are the view entries:
View Entry ID | Reduced Value |
---|---|
Some("Rust") | 2 |
Some("Cooking") | 1 |
None | 1 |
When a reduce query is issued for a single key, the value can be returned without further processing. But, if the reduce query matches multiple keys, the View's reduce()
function will be called with the already reduced values with rereduce
set to true
. For example, retrieving the total count of blog posts:
let total_post_count = db.view::<BlogPostsByCategory>().reduce().await?;
assert_eq!(total_post_count, 3);
Once BonsaiDb has gathered each of the key's reduced values, it needs to further reduce that list into a single value. To accomplish this, the View's reduce()
function to be invoked with rereduce
set to true
, and with mappings containing:
Key | Value |
---|---|
Some("Rust") | 2 |
Some("Cooking") | 1 |
None | 1 |
This produces a final value of 4.
How does BonsaiDb make this efficient?
When saving Documents, BonsaiDb does not immediately update related views. It instead notes what documents have been updated since the last time the View was indexed.
When a View is accessed, the queries include an AccessPolicy
. If you aren't overriding it, UpdateBefore
is used. This means that when the query is evaluated, BonsaiDb will first check if the index is out of date due to any updated data. If it is, it will update the View before evaluating the query.
If you're wanting to get results quickly and are willing to accept data that might not be updated, the access policies UpdateAfter
and NoUpdate
can be used depending on your needs.
If multiple simulataneous queries are being evaluted for the same View and the View is outdated, BonsaiDb ensures that only a single view indexer will execute while both queries wait for it to complete.
Using arbitrary types as a View Key
In our previous example, we used String
for the Key type. The reason is important: Keys must be sortable by our underlying storage engine, which means special care must be taken. Most serialization types do not guarantee binary sort order. Instead, BonsaiDb exposes the Key
trait. On that documentation page, you can see that BonsaiDb implements Key
for many built-in types.
Using an enum as a View Key
The easiest way to expose an enum is to derive num_traits::FromPrimitive
and num_traits::ToPrimitive
using num-derive, and add an impl EnumKey
line:
#[derive(
Serialize, Deserialize, Debug, num_derive::FromPrimitive, num_derive::ToPrimitive, Clone,
)]
pub enum Category {
Rust,
Cooking,
}
impl EnumKey for Category {}
The View code remains unchanged, although the associated Key type can now be set to Option<Category>
. The queries can now use the enum instead of a String
:
let rust_post_count = db
.view::<BlogPostsByCategory>()
.with_key(Some(Category::Rust))
.reduce()
.await?;
BonsaiDb will convert the enum to a u64 and use that value as the Key. A u64 was chosen to ensure fairly wide compatibility even with some extreme usages of bitmasks. If you wish to customize this behavior, you can implement Key
directly.
Implementing the Key
trait
The Key
trait declares two functions: as_big_endian_bytes()
and from_big_endian_bytes
. The intention is to convert the type to bytes using a network byte order for numerical types, and for non-numerical types, the bytes need to be stored in binary-sortable order.
Here is how BonsaiDb implements Key for EnumKey
:
impl<'a, T> Key<'a> for T
where
T: EnumKey,
{
type Error = std::io::Error;
const LENGTH: Option<usize> = None;
fn as_big_endian_bytes(&'a self) -> Result<Cow<'a, [u8]>, Self::Error> {
let integer = self
.to_u64()
.map(Unsigned::from)
.ok_or_else(|| std::io::Error::new(ErrorKind::InvalidData, IncorrectByteLength))?;
Ok(Cow::Owned(integer.to_variable_vec()?))
}
fn from_big_endian_bytes(bytes: &'a [u8]) -> Result<Self, Self::Error> {
let primitive = u64::decode_variable(bytes)?;
Self::from_u64(primitive)
.ok_or_else(|| std::io::Error::new(ErrorKind::InvalidData, UnknownEnumVariant))
}
}
By implementing Key
you can take full control of converting your view keys.
Schema
A Schema is a group of one or more Collections. A Schema can be instantiated as a Database. The Schema describes how a set of data behaves, and a Database is a set of data on-disk.
Database
A Database is a set of stored data. Each Database is described by a Schema. Unlike the other concepts, this concept corresponds to multiple types:
- For bonsaidb-local:
Database
- For bonsaidb-server:
ServerDatabase
- For bonsaidb-client:
RemoteDatabase
All of these types implement the Connection
trait.
Storage
The StorageConnection trait allows interacting with a BonsaiDb multi-database storage instance.
There are three implementations of the StorageConnection
trait:
Storage
: A local, file-based server implementation with no networking capabilities.Server
: A networked server implementation, written usingStorage
. This server supports QUIC- and WebSocket-based protocols. The QUIC protocol is preferred, but it uses UDP which many load balancers don't support. If you're exposing BonsaiDb behind a load balancer, WebSockets may be the only option depending on your host's capabilities.Client
: A network client implementation that connects to aServer
.
PubSub
The Publish/Subscribe pattern enables developers to design systems that produce and receive messages. It is implemented for BonsaiDb through the PubSub
and Subscriber
traits.
A common example of what PubSub enables is implementing a simple chat system. Each chat participant can subscribe to messages on the chat
topic, and when any participant publishes a chat
message, all subscribers will receive a copy of that message.
A working example of PubSub is available at examples/basic-local/examples/pubsub.rs
.
Use cases of BonsaiDb
Single database model (No networking)
This use case is most similar to utilizing SQLite for your database. In this mode, BonsaiDb directly interacts with files on your disk to provide your database. Unlike other file-based databases, however, it's easy to migrate to any of these scenarios from this starting position:
graph LR code{{Rust Code}} local[(bonsaidb-local::Database)] code <--> local
A working example of how to use a local database can be found at examples/basic-local/examples/basic-local.rs
.
Multi-database model (No networking)
This model is most similar to using multiple SQLite databases. In this mode, you interact with a Storage
that you spawn within your code.
graph LR code{{Rust Code}} local[(bonsaidb-local::Storage)] code <--> server server <--> local
If you look at the source behind Database::open_local
, you'll see that the single-database model is using Storage
under the hood.
Server model (QUIC or WebSockets)
This model is most similar to using other document databases, like CouchDB or MongoDB. In this mode, you interact with a Client
that connects via either QUIC or WebSockets with a server. From the server code's perspective, this model is the same as the multi-database model, except that the server is listening for and responding to network traffic.
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server[[bonsaidb-server]] local[(bonsaidb-local)] client-code <--> client client <-. network .-> server server <--> local server-code <--> server
A working example of this model can be found at examples/basic-server/examples/basic-server.rs
. When writing client/server applications that utilize BonsaiDb, you can have the BonsaiDb server running withing your server application. This means that your server still has the ability not use networking to interact with BonsaiDb. Regardless of if you run any other server code, your BonsaiDb server will be accessible through a Client
over the network.
API Platform model (QUIC or WebSockets)
If you're finding yourself developing an API for your application, and all of the consumers of this API are already connected to BonsaiDb, you may want to take advantage of the custom api functionality of the server:
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server[[bonsaidb-server]] backend[[Backend]] local[(bonsaidb-local)] client-code <--> client client <-. network .-> server server <--> local server-code <--> server server-code <--> backend backend <--> server
The BonsaiDb CustomServer
type accepts one generic parameter that implements the Backend
trait. This trait is used to customize the server in many ways, but one of the associated types is a CustomApi
implementor.
See this page for an overview of how to set up a custom api server.
Coming Later: Cluster model
When you're at the stage of scaling beyond a single server, you will be able to upgrade your server to a cluster using the hypothetical bonsaidb-cluster
crate. The clustering model is still being designed, but the goal is something similar to:
graph LR client-code{{Rust Client Code}} server-code{{Rust Server Code}} client[[bonsaidb-client]] server1[[server 1]] server2[[server 2]] server3[[server 3]] cluster[[bonsaidb-cluster]] client-code <--> client client <-. network .-> cluster server-code <--> cluster cluster <--> server1 cluster <--> server2 cluster <--> server3 server1 <--> server2 server2 <--> server3 server1 <--> server3
In this model, the local storage element is hidden; Each server has its own storage. This model is very similar from the viewpoint of your server and client code -- the primary difference is that the server-side connection is being established using the cluster crate. From the client's perspective, the cluster behaves as a single entity -- sending a request to any server node will result in the same result within the cluster.
All features of BonsaiDb will be designed to work in cluster mode seamlessly. PubSub
will ensure that subscribers will receive messages regardless of which server they're connected to.
Custom Api Server
The CustomApi
trait defines three associated types, Request, Response, and Error. A backend "dispatches" Request
s and expects a Result<Response, Error>
in return.
All code on this page comes from this example:
examples/basic-server/examples/custom-api.rs
.
This example defines a Request and a Response type, but uses BonsaiDb's Infallible
type for the error:
#[derive(Serialize, Deserialize, Actionable, Debug)]
#[actionable(actionable = bonsaidb::core::actionable)]
pub enum Request {
#[actionable(protection = "none")]
Ping,
#[actionable(protection = "simple")]
DoSomethingSimple { some_argument: u32 },
#[actionable(protection = "custom")]
DoSomethingCustom { some_argument: u32 },
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub enum Response {
Pong,
DidSomething,
}
impl CustomApi for ExampleApi {
type Request = Request;
type Response = Response;
type Error = Infallible;
}
To implement the server, we must first implement a custom Backend
that ties the server to the CustomApi
. We also must define a CustomApiDispatcher
, which gives an opportunity for the dispatcher to gain access to the ConnectedClient
and/or `CustomServer instances if they are needed to handle requests.
Finally, either Dispatcher
must be implemented manually or actionable
can be used to derive an implementation that uses individual traits to handle each request. The example uses actionable:
impl Backend for ExampleBackend {
type CustomApi = ExampleApi;
type CustomApiDispatcher = ExampleDispatcher;
type ClientData = ();
}
/// Dispatches Requests and returns Responses.
#[derive(Debug, Dispatcher)]
#[dispatcher(input = Request, actionable = bonsaidb::core::actionable)]
pub struct ExampleDispatcher {
// While this example doesn't use the server reference, this is how a custom
// API can gain access to the running server to perform database operations
// within the handlers. The `ConnectedClient` can also be cloned and stored
// in the dispatcher if handlers need to interact with clients outside of a
// simple Request/Response exchange.
_server: CustomServer<ExampleBackend>,
}
impl CustomApiDispatcher<ExampleBackend> for ExampleDispatcher {
fn new(
server: &CustomServer<ExampleBackend>,
_client: &ConnectedClient<ExampleBackend>,
) -> Self {
Self {
_server: server.clone(),
}
}
}
#[async_trait]
impl RequestDispatcher for ExampleDispatcher {
type Output = Response;
type Error = BackendError<Infallible>;
}
/// The Request::Ping variant has `#[actionable(protection = "none")]`, which
/// causes `PingHandler` to be generated with a single method and no implicit
/// permission handling.
#[async_trait]
impl PingHandler for ExampleDispatcher {
async fn handle(
&self,
_permissions: &Permissions,
) -> Result<Response, BackendError<Infallible>> {
Ok(Response::Pong)
}
}
Finally, the client can issue the API call and receive the response, without needing any extra steps to serialize. This works regardless of whether the client is connected via QUIC or WebSockets.
async fn ping_the_server(
client: &Client<ExampleApi>,
client_name: &str,
) -> Result<(), bonsaidb::core::Error> {
match client.send_api_request(Request::Ping).await {
Ok(Response::Pong) => {
println!("Received Pong from server on {}", client_name);
}
other => println!(
"Unexpected response from API call on {}: {:?}",
client_name, other
),
}
Ok(())
}
Permissions
One of the strengths of using BonsaiDb's custom api functionality is the ability to tap into the permissions handling that BonsaiDb uses. The Ping request was defined with protection = "none"
which skips all permission validation. However, DoSomethingSimple
uses the "simple" protection model, and DoSomethingCustom
uses the "custom" protection model. The comments in the example below should help explain the rationale:
/// The permissible actions that can be granted for this example api.
#[derive(Debug, Action)]
#[action(actionable = bonsaidb::core::actionable)]
pub enum ExampleActions {
DoSomethingSimple,
DoSomethingCustom,
}
/// With `protection = "simple"`, `actionable` will generate a trait that allows
/// you to return a `ResourceName` and an `Action`, and the handler will
/// automatically confirm that the connected user has been granted the ability
/// to perform `Action` against `ResourceName`.
#[async_trait]
impl DoSomethingSimpleHandler for ExampleDispatcher {
type Action = ExampleActions;
async fn resource_name<'a>(
&'a self,
_some_argument: &'a u32,
) -> Result<ResourceName<'a>, BackendError<Infallible>> {
Ok(ResourceName::named("example"))
}
fn action() -> Self::Action {
ExampleActions::DoSomethingSimple
}
async fn handle_protected(
&self,
_permissions: &Permissions,
_some_argument: u32,
) -> Result<Response, BackendError<Infallible>> {
// The permissions have already been checked.
Ok(Response::DidSomething)
}
}
/// With `protection = "custom"`, `actionable` will generate a trait with two
/// functions: one to verify the permissions are valid, and one to do the
/// protected action. This is useful if there are multiple actions or resource
/// names that need to be checked, or if permissions change based on the
/// arguments passed.
#[async_trait]
impl DoSomethingCustomHandler for ExampleDispatcher {
async fn verify_permissions(
&self,
permissions: &Permissions,
some_argument: &u32,
) -> Result<(), BackendError<Infallible>> {
if *some_argument == 42 {
Ok(())
} else {
permissions.check(
ResourceName::named("example"),
&ExampleActions::DoSomethingCustom,
)?;
Ok(())
}
}
async fn handle_protected(
&self,
_permissions: &Permissions,
_some_argument: u32,
) -> Result<Response, BackendError<Infallible>> {
// `verify_permissions` has already been executed, so no permissions
// logic needs to live here.
Ok(Response::DidSomething)
}
}
This example uses authenticated_permissions
to grant access to ExampleAction::DoSomethingSimple
and ExampleAction::DoSomethingCustom
to all users who have logged in:
let server = CustomServer::<ExampleBackend>::open(
ServerConfiguration::new("custom-api.bonsaidb")
.default_permissions(Permissions::from(
Statement::for_any()
.allowing(&BonsaiAction::Server(ServerAction::Connect))
.allowing(&BonsaiAction::Server(ServerAction::Authenticate(
AuthenticationMethod::PasswordHash,
))),
))
.authenticated_permissions(Permissions::from(
Statement::for_any()
.allowing(&ExampleActions::DoSomethingSimple)
.allowing(&ExampleActions::DoSomethingCustom),
)),
)
.await?;
For more information on managing permissions, see Administration/Permissions
Overview
BonsaiDb aims to offer the majority of its functionality in local operation. The networked server adds some functionality on top of the local version, but its main function is to add the ability to use networking to talk to the database.
Because of this model, it makes it easy to transition a local database to a networked database server. Start with whatever model fits your needs today, and when your neeeds change, BonsaiDb will adapt.
When to use the Local Integration
- You're going to databases from one process at a time. BonsaiDb is designed for concurrency and can scale with the capabilities of the hardware. However, the underlying storage layer that BonsaiDb is built upon, sled, does not support multiple processes writing its data simultaneously. If you need to access the database from multiple processes, the server integration is what you should use. While it doesn't offer IPC communication today, a pull-request would be accepted to that added that functionality (along with the corresponding unit tests).
- You have no public API/PubSub/access needs or have implemented those with another stack.
When to use the Server Integration
- You need to access databases from more than one process or machine.
- You are OK with downtime due to loss of service when the single server is offline. If you need to have a highly-available database, you should use the Cluster Integration (Coming Soon).
- Your database load can be met with a single machine. If you have enough load that you need to share the processing power of multiple servers, you should use the Cluster Integration (Coming Soon)
Coming Soon: When to use the Cluster Integration
- You need to access databases from more than one machine.
- You need a highly-available setup.
- You need/want to split load between multiple machines.
Integrating BonsaiDb Locally
BonsaiDb supports multiple databases and multiple schemas. However, for many applications, you only need a single database.
If you're only wanting a single database, the setup is straightforward: (from examples/basic-local/examples/basic-local.rs
)
let db = Database::<Message>::open(
StorageConfiguration::new("basic.bonsaidb")
).await?;
Under the hood, BonsaiDb is creating a multi-database Storage
with a local Database
named default
for you. If you need to switch to a multi-database model, you can open the storage and access the default
database: (adapted from examples/basic-local/examples/basic-local.rs
)
let storage = Storage::open(
Configuration::new("basic.bonsaidb")
.with_schema::<Message>()?
).await?;
let db = storage.database::<Message>("default").await?;
You can register multiple schemas so that databases can be purpose-built.
Common Traits
To help your code transition between different modes of accessing BonsaiDb, you can use these common traits to make your methods accept any style of BonsaiDb access.
Database
implementsConnection
,KeyValue
, andPubSub
.Storage
implementsStorageConnection
.
For example, examples/basic-local/examples/basic-local.rs
uses this helper method to insert a record:
async fn insert_a_message<C: Connection>(
connection: &C,
value: &str,
) -> Result<(), bonsaidb::core::Error> {
Message {
contents: String::from(value),
timestamp: SystemTime::now(),
}
.push_into(connection)
.await?;
Ok(())
}
Integrating the networked BonsaiDb Server
To access BonsaiDb over the network, you're going to be writing two pieces of code: the server code and the client code.
Your BonsaiDb Server
The first step is to create a Server
, which uses local Storage
under the hood. This means that if you're already using BonsaiDb in local mode, you can swap your usage of Storage
with Server
in your server code without running your database through any tools. Here's the setup code from basic-server/examples/basic-server.rs
let server = Server::open(
ServerConfiguration::new("server-data.bonsaidb")
.default_permissions(DefaultPermissions::AllowAll)
.with_schema::<Shape>()?,
)
.await?;
if server.certificate_chain().await.is_err() {
server.install_self_signed_certificate(true).await?;
}
let certificate = server
.certificate_chain()
.await?
.into_end_entity_certificate();
server.create_database::<Shape>("my-database", true).await?;
Once you have a server initialized, calling listen_on
will begin listening for connections on the port specified. This uses the preferred native protocol which uses UDP. If you find that UDP is not working for your setup or want to put BonsaiDb behind a load balancer that doesn't support UDP, you can enable WebSocket support and call listen_for_websockets_on
.
You can call both, but since these functions don't return until the server is shut down, you should spawn them instead:
let task_server = server.clone();
tokio::spawn(async move {
task_server.listen_on(5645).await
});
let server = server.clone();
tokio::spawn(async move {
task_server.listen_for_websockets_on("localhost:8080", false).await
});
If you're not running any of your own code on the server, and you're only using one listening method, you can just await the listen method of your choice in your server's main. This code example configures BonsaiDb on UDP port 5645, but this is not an officially registered port.
From the Client
The Client
can support both the native protocol and WebSockets. It determines which protocol to use based on the scheme in the URL:
bonsaidb://host:port
will connect using the native BonsaiDb protocol.ws://host:port
will connect using WebSockets.
Here's how to connect, from examples/basic-server/examples/basic-server.rs
:
Client::new(
Url::parse("bonsaidb://localhost:5645")?,
Some(certificate),
)
.await?
This is using a pinned certificate to connect. Other methods are supported, but better certificate management is coming soon.
Common Traits
Server
implementsStorageConnection
.Server::database()
returns a localDatabase
, which implementsConnection
,KeyValue
, andPubSub
. Local access in the server executable doesn't go over the network.Client
implementsStorageConnection
.Client::database()
returns aRemoteDatabase
, which implementsConnection
,KeyValue
, andPubSub
.
Integrating into a BonsaiDb Cluster
Coming Soon.
The goals of this feature are to make clustering simple. We hope to provide an experience that allows someone who is operating a networked server to desire two types of clusters:
One-leader mode
When setting up a cluster initially, you will begin with one-leader mode. In this mode, you can add as many nodes to the cluster as you wish, but only one node will be processing all of the data updates. All nodes can handle requests, but requests that can't be served locally will be forwarded to the leader. This allows for the use of read-replicas to alleviate load in some read-heavy situations.
Another benefit of this mode are that it supports a two-node configuration. If you're scaling your app and need a reliable backup for quicker disaster recovery, you can operate a read replica and manually failover when the situation arises.
If you decide to allow automatic failover in this mode, there is a chance for data loss, as the leader does not wait for read-replicas to synchronize data. Any transactions that committed and were not synchronized before the outage occurred would not be on the other servers. Thus, this mode is not intended for high-availability configurations, although some users may elect to use it in such a configuration knowing these limitations.
Quorum mode
Once you have a cluster with at least 3 nodes, you can switch the cluster into quorum mode. For any given N
nodes, all requests must reach an agreed response by N / 2 + 1
members. For example, in a cluster of 3 nodes, there must be 2 successful responses before a client can receive a response to its request.
In quorum mode, your data is divided into shards and those shards replicated throughout the cluster onto at least 3 nodes (configurable). Initially, with just 3 nodes available, the only benefits are having a highly-available cluster with no data loss during when a single node goes down.
As you add more nodes to your cluster, however, you can re-balance your databases to move shards. The author of BonsaiDb did not enjoy this process in CouchDB when he had to do it and aims to make these tools easy and effortless to use. Ideally, there would be a low-maintenance mode that would allow the cluster to re-shard itself authomatically during allowed maintenance periods, ensuring data is distributed more evenly amongst the cluster.
Additional long-term dreams of quorum mode include the ability to customize node selection criteria on a per-database basis. The practical use of node selection is to ensure that at least 3 unique nodes are picked for each shard. However, allowing custom logic to evaluate which nodes should be selected for any database would allow ultimate flexibility. For example, if you have a globally deployed application, and you have some data that is geographically specific, you could locate each region's database on nodes within those locations' data centers.
When?
Clustering is an important part of the design of Cosmic Verge. As such, it is a priority for us to work on. But, the overall game is a very large project, so we hesitate to make any promises on timelines.
Connection
The Connection
trait contains functions for interacting with collections in a database. This trait is implemented by the Database
types in each crate:
- For bonsaidb-local:
Database
- For bonsaidb-server:
ServerDatabase
- For bonsaidb-client:
RemoteDatabase
Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
This is an async trait, which unfortunately yields messy documentation due to the lifetimes.
StorageConnection
The StorageConnection
trait contains functions for interacting with BonsaiDb's multi-database storage. This trait is implemented by these types:
- For bonsaidb-local:
Storage
- For bonsaidb-server:
CustomServer<Backend>
/Server
- For bonsaidb-client:
Client
Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
This is an async trait, which unfortunately yields messy documentation.
PubSub Trait
The PubSub
trait contains functions for using PubSub in BonsaiDb. This trait is implemented by the Database
types in each crate:
- For bonsaidb-local:
Database
- For bonsaidb-server:
ServerDatabase
- For bonsaidb-client:
RemoteDatabase
Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
This is an async trait, which unfortunately yields messy documentation.
Key-Value Trait
The KeyValue
trait contains functions for interacting the atomic key-value store. The key-value store provides high-performance atomic operations without ACID compliance. Once the data is persisted to disk, it holds the same guarantees as all of BonsaiDb, but this feature is designed for high throughput and does not wait to persist to disk before reporting success to the client. This trait is implemented by the Database
types in each crate:
- For bonsaidb-local:
Database
- For bonsaidb-server:
ServerDatabase
- For bonsaidb-client:
RemoteDatabase
Using this trait, you can write code that generically can work regardless of whether BonsaiDb is operationg locally with no network connection or across the globe.
This is an async trait, which unfortunately yields messy documentation.
Configuration
BonsaiDb attempts to have reasonable default configuration options, but it's important to browse the available options to ensure there aren't options that might help your particular needs.
Storage Configuration
The StorageConfiguration
structure is used to open a local-only database. The ServerConfiguration
struct contains an instance of StorageConfiguration
, and all configuration optionsl are available on it.
Vault Key Storage
By default, BonsaiDb sets vault_key_storage
to a file stored within the database folder. This is incredibly insecure and should not be used outside of testing.
For secure encryption, it is important to store the vault keys in a location that is separate from the database. If the keys are on the same harware as the database, anyone with access to the disk will be able to decrypt the stored data.
If you have more than one server, you can still use LocalVaultKeyStorage
in conjunction with a mounted network share for reasonable security practices -- assuming the network share itself is properly secured.
If you have an S3-compatible storage service available, you can use bonsaidb::keystorage::s3
to store the vault keys with that service.
Note that by storing your keys remotely, your BonsaiDb database will not be able to be opened unless the keys are able to be read.
Vault Key Storage can also be set using Builder::vault_key_storage
.
Default Encryption Key
By setting default_encryption_key
to a key, all data will be encrypted when written to the disk.
If default_encryption_key
is None
, encryption will still be performed for collections that return a key from Collection::encryption_key()
.
Can also be set using Builder::default_encryption_key
.
Tasks: Worker Count
The tasks.worker_count
setting controls the number of worker tasks that are spawned to process background tasks.
Can also be set using Builder::tasks_worker_count
.
Views: Check Integrity on Open
When views.check_integrity_on_open
is true, all views in all databases will be checked on startup for integrity. If this value is false, the integrity of the view will not be checked until it is accessed for the first time.
By default, BonsaiDb delays checking a view's integrity until its accessed for the first time. it may, however, be preferred to have a higher startup time to ensure consistent response times once the server is running after a restart of the server.
Can also be set using Builder::check_view_integrity_on_open
.
Key-Value Persistence
The Key-Value store is designed to be a lightweight, atomic data store that is suitable for caching data, tracking metrics, or other situations where a Collection might be overkill.
By default, BonsaiDb persists Key-Value store changes to disk immediately. For light usage, this will not be noticable, and it ensures that no data will ever be lost.
If you're willing to accept potentially losing recent writes, key_value_persistence
can be configured to lazily commit changes to disk. The documentation for KeyValuePersistence
contains examples as well as an explanation of how the rules are evaluated.
Key-Value Persistence can also be set using Builder::key_value_persistence
.
Server Configuration
The ServerConfiguration
structure is used to open a BonsaiDb server. Being built atop the local storage engine, this structure exposes an instance of StorageConfiguration
, allowing full customization.
Server Name
The server_name
setting is for the primary DNS name of the server. The server's TLS certificate should be valid for the server's name.
When using ACME, this setting controls the primary certificate requested.
Can also be set using a builder-style method.
Client Simultaneous Request Limit
BonsaiDb's networking protocols support multiple requests to be sent before any responses have been received, sometimes called pipelining. Without a limit, a single malicious client could send a large number of load-inducing requests and cause reliability of service issues for other clients.
By limiting each connection's maximum ability to a reasonable number, it allows clients to take advantage of pipelining without allowing any one client to saturate the server with requests.
This limit is set using the client_simultaneous_request_limit field or builder-style method.
Request Worker Count
The request_workers
configuration controls the number of worker tasks that process incoming requests from connected clients. It can also be set via a builder-style method.
Default Permissions and Authenticated Permissions
When first connecting to a server, the client is unauthenticated and is granted the permissions defined by default_permissions
. Once a connected client has authenticated, the client will be granted authenticated_permissions
in addition to whatever permissions already granted by the authenticated role.
By default, both default_permissions
and authenticated_permissions
contain no granted permissions. This means that by default, no connections are allowed to a server, as the connection hasn't been gramted BonsaiAction::Server(
ServerAction::Connect() )
.
ACME Configuration (LetsEncrypt)
ACME has two configurable options, a contact email and the ACME directory.
ACME Contact Email
The contact email is submitted to the ACME directory as part of requesting a TLS certificate. It is optional for the LetsEncrypt directories.
A valid value for this field begins with mailto:
.
The contact email can be set using acme.contact_email
or the builder-style method.
ACME Directory
By default, BonsaiDb uses the production LetsEncrypt directory, but any ACME directory can be specified.
The directory can be set using acme.directory
or the builder-style method.
Permissions
BonsaiDb uses role-based access control (RBAC). In short, permissions are granted through statements within permission groups. Users are able to log in and receive permissions that were granted via permission groups or roles.
This section has two subsections:
- Permission Statements: An overview of the resource names and actions used within BonsaiDb.
- Users, Groups, and Roles: A more thorough explanation of BonsaiDb's access control.
While the most common use case will be granting permissions to act upon BonsaiDb itself, the permissions system is designed to be generic enough that it can be used as the application's permission system if desired.
By default, no actions are allowed.
Currently, permissions are only applied to connections over a network. In the future, permissions will be able to be applied even on local connections.
Permission Statements
A Statement grants permissions to execute Action
s on ResourceName
s.
Actions and Resources
ResourceName
s are simply namespaced Identifier
s. An example could be: "bonsaidb".*."khonsulabs-admin.users".1
. Each segment can be a string, an integer, or a wildcard (*
).
In BonsaiDb, nearly everything has a resource name. The example above refers to a document with ID 1
in the khonsulabs-admin.users
collection in any database. The bonsaidb::core::permissions::bonsai
module contains functions to create properly formatted ResourceName
s.
Also within the same module are the built-in Action
s. The base enum for all actions used within BonsaiDb is BonsaiAction
Below is an overview of the resource names and actions by category.
Server
The ServerAction
enum contains the actions that are related to StorageConnection
. For APIs that accept a database name parameter, the resource name will be database_resource_name(database)
. For all other actions, the resource name is bonsaidb_resource_name()
.
For actions that operate upon users (e.g., creating a user), the resource name is user_resource_name(username).
At-rest Encryption
Access to encrypted information can be controlled by limiting access to the encryption key used. Currently, BonsaiDb only has support for a shared master key, but in the future additional keys will be able to be created. Because Encrypt
and Decrypt
are separate actions, access to read and write can be controlled independently.
The resource name for an encryption key is encryption_key_resource_name(key_id)
.
Database
The DatabaseAction
enum contains the actions that are related to a specific database. Actions that act on the database directly will use the resource name database_resource_name(database)
.
For Collection
s, there are three resource names used. For actions that operate on the collection directly, the resource name is collection_resource_name(database, collection)
. For actions that operate on a document, the resource name is document_resource_name(database, collection, id)
. Finally, for actions that operate on a View
, the resource name is view_resource_name(database, view)
.
For actions that operate upon the key-value entry, the resource name is keyvalue_key_resource_name(database, namespace, key)
.
For actions that operate on a PubSub
topic, the resource name is pubsub_topic_resource_name(database, topic)
.
Statement Examples
Coming Soon.
Users, Groups, and Roles
The most common flow that a database administrator needs to support is granting a user the ability to take specific actions on specific resources. To accomplish this, a PermissionGroup
must be created containing the permission statements, covered in the previous section, that you wish to apply.
PermissionGroup
s can be assigned directly to users by adding the group ID to their User
document.
At first glance, Role
s may appear somewhat redundant. One or more PermissionGroup
s can be assigned to a role, and roles can be assigned to a user. Why would you want to use roles at all?
The general advice the authors of BonsaiDb suggest is to use groups for limited amounts of functionality, keeping each group's list of statements concise and easy to understand. Then, create roles that combine groups of functionality in meaningful ways. One meaningful way could be creating roles based on job titles inside of a company. In theory, a person's job defines what they do within the company.
In practice, permissions are never as clean as one would hope, which is why BonsaiDb allows assigning groups and roles to users directly. Roles should be used as much as possible, but sometimes assigning a group directly is just needed. For example, imagine the CEO telling you, "I know Bob is just a sales guy, but he needs to be able to update this record. I trust him more than the other sales people. Just make it happen." As the database administrator, you can decide whether to introduce a new role or just temporarily assign an extra group to this one user.
At-Rest Encryption
BonsaiDb offers at-rest encryption. An overview of how it works is available in the bonsaidb::local::vault
module.
Enabling at-rest encryption by default
When opening your BonsaiDb instance, there is a configuration option default_encryption_key
. Once this is set, all new data written that supports being encrypted will be encrypted at-rest.
let storage = Storage::open(
StorageConfiguration::new(&directory)
.vault_key_storage(vault_key_storage)
.default_encryption_key(KeyId::Master)
)
.await?;
Enabling at-rest encryption on a per-collection basis
Collection::encryption_key()
can be overridden on a per-Collection basis. If a collection requests encryption but the feature is disabled, an error will be generated.
To enable a collection to be encrypted when the feature is enabled, only return a key when ENCRYPTION_ENABLED is true.