CQL-Starter

Author: Jeff Banks (DataStax)

Introduction

Welcome to the NoSQLBench Quick Byte, the first session in a “Getting Started” series for NoSQLBench. This session introduces a new Cassandra Query Language (CQL) starter workload now available in version 5 of NoSQLBench.

If you haven't heard of NoSQLBench, checkout our introduction material
If you already have a foundation with NoSQLBench and would like to understand what's included in the most recent version, checkout the release notes.

This session illustrates use of CQL, using NoSQLBench v5, along with a Docker deployment of Apache Cassandra 4.1 using its latest image.

Let’s get rolling …

Setup

This session was tested with:

Ubuntu (v20.4)
Docker (v20.10.18)
NoSQLBench (v5.17.2)
Apache Cassandra (v4.1)

Install Docker

Ensure Docker is installed on your operating system. You can download it from here

Get NB5

Obtain official NB5 release, if you don't already have it, from latest nb5 release, and then chmod +x nb5.

See get nosqlbench for other download options.

You should be able to see your version installed using:

./nb5 --version

Run Cassandra

Run the latest Cassandra 4.* docker.

docker run --name cass4 -p 9042:9042 -d cassandra

If you have issues, more details can be found at Apache Cassandra on docker hub.

Verify Cassandra is started from logs:

docker container logs cass4

Running the scenario

Now, we are ready to run the cql-starter NoSQLBench scenario.

Locate NB5

Navigate via your local command line to where the nb5 binary was previously downloaded.

Verify

Ensure that issuing the following command identifies the workload used for this session.

./nb5 --list-workloads | grep cql-starter

Example output:

/activities/baselines/cql-starter.yaml

Optional step

An alternative is to copy the workload configuration listed below to your own local file in a folder of your choosing. You can name it whatever you like, as you will specify the absolute file path directly when issuing the scenario command.

CQL workload template

This YAML file is designed as a basic foundation for continuing to learn NoSQLBench capabilities as well as a starting point for customizing for your own testing needs.

You will notice that the number of cycles are minimal to support local testing to ensure that your configuration is constructed properly. When customizing these for real-world tests, the values can be set to millions or more! That is where the full power of NoSQLBench shines to generate critical metrics for analysis to make a system more robust.


description: >2
 A cql-starter workload.
 * Cassandra: 3.x, 4.x.
 * DataStax Enterprise: 6.8.x.
 * DataStax Astra.

scenarios:
 default:
   schema: run driver=cql tags==block:schema threads==1 cycles==UNDEF
   rampup: run driver=cql tags==block:rampup cycles===TEMPLATE(rampup-cycles,1) threads=auto
   main: run driver=cql tags==block:"main.*" cycles===TEMPLATE(main-cycles,10) threads=auto
   # rampdown: run driver=cql tags==block:rampdown threads==1 cycles==UNDEF
 astra:
   schema: run driver=cql tags==block:schema-astra threads==1 cycles==UNDEF
   rampup: run driver=cql tags==block:rampup cycles===TEMPLATE(rampup-cycles,10) threads=auto
   main: run driver=cql tags==block:"main.*" cycles===TEMPLATE(main-cycles,10) threads=auto

params:
 a_param: "value"

bindings:
 machine_id: ElapsedNanoTime(); ToHashedUUID() -> java.util.UUID
 message: Discard(); TextOfFile('data/cql-starter-message.txt');
 rampup_message: ToString();
 time: ElapsedNanoTime(); Mul(1000); ToJavaInstant();
 ts: ElapsedNanoTime(); Mul(1000);

blocks: 
 schema:
   params:
     prepared: false
   ops:
     create-keyspace: |
       create keyspace if not exists <<keyspace:starter>>
       WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '<<rf:1>>'}
       AND durable_writes = true;
     create-table: |
       create table if not exists <<keyspace:starter>>.<<table:cqlstarter>> (
       machine_id UUID, 
       message text,   
       time timestamp,         
       PRIMARY KEY ((machine_id), time)
       ) WITH CLUSTERING ORDER BY (time DESC);

 schema-astra:
   params:
     prepared: false
   ops:
     create-table-astra: |
       create table if not exists <<keyspace:starter>>.<<table:cqlstarter>> (
       machine_id UUID,
       message text,
       time timestamp,
       PRIMARY KEY ((machine_id), time)
       ) WITH CLUSTERING ORDER BY (time DESC);

 rampup:
   params:
     cl: <<write_cl:LOCAL_QUORUM>>
     idempotent: true
   ops:
     insert-rampup: |
       insert into  <<keyspace:starter>>.<<table:cqlstarter>> (machine_id, message, time)
       values ({machine_id}, {rampup_message}, {time}) using timestamp {ts};

 rampdown:
   ops:
     truncate-table: |
       truncate table <<keyspace:starter>>.<<table:cqlstarter>>;

 main-read:
   params:
     ratio: <<read_ratio:1>>
     cl: <<read_cl:LOCAL_QUORUM>>
   ops:
     select-read: |
       select * from <<keyspace:starter>>.<<table:cqlstarter>>
       where machine_id={machine_id};
 main-write:
   params:
     ratio: <<write_ratio:9>>
     cl: <<write_cl:LOCAL_QUORUM>>
     idempotent: true
   ops:
     insert-main: |
       insert into <<keyspace:starter>>.<<table:cqlstarter>>
       (machine_id, message, time) values ({machine_id}, {message}, {time}) using timestamp {ts};

Before running NoSQLBench scenario, let’s take a look at the layout of the file. Most of this will be the same layout structure used in all NB5 workload files so this helps to reveal a large amount of the basics. This is called a workload template.

Starting from the top of the workload template, the primary sections include:

Description - A way to describe what the workload does.
Scenarios - A set of named scenarios for detailing the intent of the workload and defines that for various blocks (e.g. schema, rampup, main, etc.).
Params - Optional parameters of interest to reference for applying values.
Bindings - Named recipes for generated data. These are referenced in block operations.
Blocks - Where the labeled operations reside (e.g. schema, rampup, and main).
- Schema - A block section where the schema is actually defined and created.
- Rampup - A block section for data setup that becomes the backdrop for testing; it’s the density of data outside the metrics collected in the main block.
- Main - A block section that is the target of metrics collection activities.

This may look overwhelming at first glance, but the magic of what can be done for load testing target resources becomes more apparent as settings are tweaked for various test cases.

Basic Operations

The workload operations in the cql-starter are quite basic, and this is on purpose. The intent is to focus on a simple set of read and write operations to understand how to work with NoSQLBench and Cassandra using basic, direct CQL.

Table and Keyspace

For the default scenario workload, a simple table named ‘cqlstarter’ will be created with a keyspace named ‘starter’. There will be three fields for our table:

machine_id
message
time

The machine_id is a unique identifier type, the message field is a text type, and the time is a timestamp type.

Since the example is designed to be run locally, the Cassandra keyspace replication is defined using a SimpleStrategy with a replication factor of one.

WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '<<rf:1>>'}

Default scenario

For this session, the ‘default’ scenario is being used.

    
    scenarios:
      default:
        schema: run driver=cql tags==block:schema …
        rampup: run driver=cql tags==block:rampup …
        main: run driver=cql tags==block:"main.*" …

One may notice there is an ‘astra’ scenario included in the file with its own set of activities defined (e.g. -astra). References to astra are simply there to show how additional scenarios can be defined in a single workload file.

    
    astra:
      schema: run driver=cql tags==block:schema-astra threads==1 cycles==UNDEF

This illustrates how flexible and customizable the workload file can become. The words are customizable and can be tailored for understanding the test case for any business or technical domain.

Bindings

Values for our three fields during insert, will come from the bindings section of the file.
Basic examples are included in the cql-starter, but this illustrates how bindings supply values to be used by operations. Again, these are basic, just to illustrate how binding functions can be utilized.


    
    bindings:
      machine_id: ElapsedNanoTime(); ToHashedUUID() -> java.util.UUID
      message: Discard(); TextOfFile('data/cql-starter-message.txt');
      rampup_message: ToString();
      time: ElapsedNanoTime(); Mul(1000); ToJavaInstant();
      ts: ElapsedNanoTime(); Mul(1000);

Notice how we can reference text from a file to be used for our message value. Nothing fancy, but illustrates how tests can leverage external information from files for decoupling input from the workload file itself. Think of this for things like secret token references, etc. that need to be referenced.

Discard(); TextOfFile('data/cql-starter-message.txt')

Note: The Discard() function is used to indicate a no-op as the initial message value. This may change in the future, but for now it is a necessity due to the nature of bindings defaulting to Long values. This is why the rampup_message was included for illustration as it uses a ToString(); function assigning a string value. By default, the binding's value is 0L.

Hands on

Let’s run the cql-starter.

Running

Using the nb5 binary, issue the following command

./nb5 activities/baselines/cql-starter.yaml default hosts=localhost localdc=datacenter1

This command identifies that the default scenario workload is used with the key-value args passed along for use by the cqld4 adapter.

Examine the results

After the workload has been run, let’s take a look at the results from Cassandra itself using cqlsh.

docker container exec -it cass4 sh
cqlsh
select * from starter.cqlstarter;

You should see the single rampup entry along main operation entries in the Cassandra table.

Customize

Now, let’s customize the cql-starter to make it a bit more your own.

Save the .yaml file to your local environment.

One easy way, is to utilize the nb5 --copy command.

./nb5 --copy cql-starter

This provides a fresh workload file for you for cql-starter.

Edit the file and uncomment under the default scenario the following entry:

# rampdown: run driver=cql tags==block:rampdown threads==1 cycles==UNDEF

When you want to customize the cql-starter, you can simply target the file outside the NB5 distribution using:

./nb5 adapter-cqld4/<rel-path-to-customized-file>.yaml default hosts=localhost localdc=datacenter1

Also, if you would like to see more details in the output, add (-v, -vv, or -vvv) to the command.

./nb5 adapter-cqld4/<rel-path-to-customized-file>.yaml default hosts=localhost localdc=datacenter1 -v

When the workload is run after uncommenting the rampdown, selecting the content again using cqlsh, returns a table that has been truncated.

Next Steps

Checkout the NoSQLBench getting started section and details for its capabilities for your next testing initiative. This includes a number of built-in workloads that you can start from for more advanced scenarios.

Want to contribute?

It’s worth mentioning, NoSQLBench is open source and we are looking for contributions to expand its features! Head on over to the contributions page to find out more.

We will continue to have more Quick Bytes for NoSQLBench in the near future.

Stay tuned, and thank you for reading!

Back to top