NB5 Docs► Reference Section► Binding Functions▼ general functions 🖺

These functions have no particular category, so they ended up here by default.

Add

Adds a value to the input.

AddCycleRange

Adds a cycle range to the input, producing an increasing sawtooth-like output.

AddHashRange

Adds a pseudo-random value within the specified range to the input.

AlphaNumericString

Create an alpha-numeric string of the specified length, character-by-character.

ByteBufferSizedHashed

Create a ByteBuffer from a long input based on a provided size function. As a 'Sized' function, the first argument is a function which determines the size of the resulting ByteBuffer. As a 'Hashed' function, the input value is hashed again before being used as value.

CSVFrequencySampler

Takes a CSV with sample data and generates random values based on the relative frequencies of the values in the file. The CSV file must have headers which can be used to find the named columns. I.E. take the following imaginary `animals.csv` file: animal,count,country puppy,1,usa puppy,2,colombia puppy,3,senegal kitten,2,colombia `CSVFrequencySampler('animals.csv', animal)` will return `puppy` or `kitten` randomly. `puppy` will be 3x more frequent than `kitten`. `CSVFrequencySampler('animals.csv', country)` will return `usa`, `colombia`, or `senegal` randomly. `colombia` will be 2x more frequent than `usa` or `senegal`. Use this function to infer frequencies of categorical values from CSVs.

CSVSampler

This function is a toolkit version of the {@link WeightedStringsFromCSV} function. It is more capable and should be the preferred function for alias sampling over any CSV data. This sampler uses a named column in the CSV data as the value. This is also referred to as the labelColumn . The frequency of this label depends on the weight assigned to it in another named CSV column, known as the weightColumn .

Combining duplicate labels

When you have CSV data which is not organized around the specific identifier that you want to sample by, you can use some combining functions to tabulate these prior to sampling. In that case, you can use any of "sum", "avg", "count", "min", or "max" as the reducing function on the value in the weight column. If none are specified, then "sum" is used by default. All modes except "count" and "name" require a valid weight column to be specified.

Map vs Hash mode

As with some of the other statistical functions, you can use this one to pick through the sample values by using the map mode. This is distinct from the default hash mode. When map mode is used, the values will appear monotonically as you scan through the unit interval of all long values. Specifically, 0L represents 0.0d in the unit interval on input, and Long.MAX_VALUE represents 1.0 on the unit interval.) This mode is only recommended for advanced scenarios and should otherwise be avoided. You will know if you need this mode.

@param labelColumn The CSV column name containing the value @param weightColumn The CSV column name containing a double weight @param data Sampling modes or file names. Any of map, hash, sum, avg, count are taken as configuration modes, and all others are taken as CSV filenames.

CharBufImage

Builds a shared text image in memory and samples from it pseudo-randomly with hashing. The characters provided can be listed like a string (abc123), or can include range specifiers like a hyphen (a-zA-Z0-9). These characters are used to build an image of the specified size in memory that is sampled from according to the size function. The extracted value is sized according to either a provided function, a size range, or otherwise the whole image. The image can be varied between tests if you want by specifying a seed value. If no seed value is specified, then the image length is used also as a seed.

CharBufferExtract

Create a CharBuffer from the first function, and then sample data from that buffer according to the size function. The initFunction can be given as simply a size, in which case ByteBufferSizedHash is used with Hex String conversion. If the size function yields a size larger than the available buffer size, then it is lowered to that size automatically. If it is lower, then a random offset is used within the buffer image. This function behaves slightly differently than most in that it creates and caches as source byte buffer during initialization.

CircleVectors

Clamp

Clamp the output values to be at least the minimum value and at most the maximum value.

Combinations

Convert a numeric value into a code according to ASCII printable characters. This is useful for creating various encodings using different character ranges, etc. This mapper can map over the sequences of character ranges providing every unique combination and then wrapping around to the beginning again. It can convert between character bases with independent radix in each position. Each position in the final string takes its values from a position-specific character set, described by the shorthand in the examples below. The constructor will throw an error if the number of combinations exceeds that which can be represented in a long value. (This is a very high number).

Concat

This is the core implementation of the Concat style of String binding. It is the newer and recommended version of {@link Template}.

Users should use one of these wrappers:



This implementation is available for specialized use when needed, but the above versions are much more self-explanatory and easy to use.

As with previous implementations, the basic input which is fed to the functions is the sum of the input cycle and the step, where the step is simply the index of the insertion point within the template string. These start at 0, so a template string which contains "{}-{}" will have two steps, 0, and 1. For cycle 35, the first function will take input 35, and the second 36. This can create some local neighborhood similarity in test data, so other forms are provided which can hash the values for an added degree of (effective) randomness and one that chains these so that each set of values from a Concat binding are quite distinct from each other.


Binding functions used to populate each step of the template may have their own bounds of output values like {@link Combinations}. These are easy to use internally since they work well with the hashing. However, some other functions may operate over the whole space of long values, and come with no built-in cardinality constraints. It is recommended to use those with built-in constraints when you want to render a discrete population of values.

ConcatChained

This is a variant of Concat which chains the hash values from step to step so that each of the provided functions will yield unrelated values. The first input value to a function is a hash of the cycle input value, the next is a hash of the first input value, and so on.

ConcatCycle

This is a variant of Concat which always uses the input cycle value as the input for all the functions provided.

ConcatFixed

This is a variant of Concat which always uses the same value as input for the functions provided.

ConcatHashed

This is a variant of Concat which always hashes the cycle+step value for each function provided.

ConcatStepped

This is a variant of Concat which uses the cycle+step sum for each of the functions provided.

CycleRange

Yields a value within a specified range, which rolls over continuously.

DelimFrequencySampler

Takes a CSV with sample data and generates random values based on the relative frequencies of the values in the file. The CSV file must have headers which can be used to find the named columns. I.E. take the following imaginary `animals.csv` file: animal,count,country puppy,1,usa puppy,2,colombia puppy,3,senegal kitten,2,colombia `CSVFrequencySampler('animals.csv', animal)` will return `puppy` or `kitten` randomly. `puppy` will be 3x more frequent than `kitten`. `CSVFrequencySampler('animals.csv', country)` will return `usa`, `colombia`, or `senegal` randomly. `colombia` will be 2x more frequent than `usa` or `senegal`. Use this function to infer frequencies of categorical values from CSVs.

DirectoryLines

Read each line in each matching file in a directory structure, providing one line for each time this function is called. The files are sorted at the time the function is initialized, and each line is read in order. This function does not produce the same result per cycle value. It is possible that different cycle inputs will return different inputs if the cycles are not applied in strict order. Still, this function is useful for consuming input from a set of files as input to a test or simulation.

DirectoryLinesStable

Read each line in each matching file in a directory structure, providing one line for each time this function is called. The files are sorted at the time the function is initialized, and each line is read in order.

This function accepts long input values, but they are used as ints, modulo the total number of lines known. This is due to historic limitations in the Java file APIs and file size support.

This is a variant of {@link DirectoryLines}. This version keeps a map of files and their respective cardinality, computed at initialization time. The content is assumed to be static during the lifetime of this function.

The value returned for a given cycle is stable, so long as the underlying data is stable.


This caches all data at initialization time. If you need to buffer the data in stream mode, use {@link DirectoryLines} instead, which is not order-stable.

Discard

This function takes a long input and ignores it. It returns a generic object which is meant to be used as input to other function which don't need a specific input.

Div

Divide the operand by a fixed value and return the result.

DivideToLongToString

This is equivalent to Div(...), but returns the result after String.valueOf(...). This function is also deprecated, as it is easily replaced by other functions.

ElapsedNanoTime

Provide the elapsed nano time since the process started. CAUTION: This does not produce deterministic test data.

EscapeJSON

Escape all special characters which are required to be escaped when found within JSON content according to the JSON spec

{@code
\b  Backspace (ascii code 08)
\f  Form feed (ascii code 0C)
\n  New line
\r  Carriage return
\t  Tab
\"  Double quote
\\  Backslash character
\/  Forward slash
}

FieldExtractor

Extracts out a set of fields from a delimited string, returning a string with the same delimiter containing only the specified fields.

FirstLines

FixedValue

Yield a fixed value.

FixedValues

Yield one of the specified values, rotating through them as the input value increases.

FullHash

This uses the Murmur3F (64-bit optimized) version of Murmur3, not as a checksum, but as a simple hash. It doesn't bother pushing the high-64 bits of input, since it only uses the lower 64 bits of output. This version returns the value regardless of this sign bit. It does not return the absolute value, as {@link Hash} does.

Hash

This uses the Murmur3F (64-bit optimized) version of Murmur3, not as a checksum, but as a simple hash. It doesn't bother pushing the high-64 bits of input, since it only uses the lower 64 bits of output. It does, however, return the absolute value. This is to make it play nice with users and other libraries.

HashInterval

Return a value within a range, pseudo-randomly, using interval semantics, where the range of values return does not include the last value. This function behaves exactly like HashRange except for the exclusion of the last value. This allows you to stack intervals using known reference points without duplicating or skipping any given value. You can specify hash intervals as small as a single-element range, like (5,6), or as wide as the relevant data type allows.

HashRange

Return a double value within the specified range. This function uses an intermediate long to arrive at the sampled value before conversion to double, thus providing a more linear sample at the expense of some precision at extremely large values. The various HashRange functions take an input long, hash it to a random long value, and then use to interpolate a fractional value between the minimum and maximum values. To select a specific type of HashRange function, simply use the same datatype in the min and max values you wish to have on output. You can specify hash ranges as small as a single-element range, like (5,5), or as wide as the relevant data type allows.

HashRangeScaled

Return a pseudo-random value which can only be as large as the input times a scale factor, with a default scale factor of 1.0d

HashedByteBufferExtract

Create a ByteBuffer from the first function, and then sample data from that bytebuffer according to the size function. The initFunction can be given as simply a size, in which case ByteBufferSizedHash is used. If the size function yields a size larger than the available buffer size, then it is lowered to that size automatically. If it is lower, then a random offset is used within the buffer image. This function behaves slightly differently than most in that it creates and caches as source byte buffer during initialization.

HashedFileExtractToString

Pseudo-randomly extract a section of a text file and return it according to some minimum and maximum extract size. The file is loaded into memory as a shared text image. It is then indexed into as a character buffer to find a pseudo-randomly sized fragment.

@param filename The file name to be loaded @param sizefunc A function which determines the size of the data to be loaded.

HashedLineToInt

Return a pseudo-randomly selected integer value from a file of numeric values. Each line in the file must contain one parsable integer value.

HashedLineToString

Return a pseudo-randomly selected String value from a single line of the specified file.

HashedLinesToKeyValueString

Generate a string in the format key1:value1;key2:value2;... from the words in the specified file, ranging in size between zero and the specified maximum.

HashedLoremExtractToString

Provide a text extract from the full lorem ipsum text, between the specified minimum and maximum size.

HashedRangedToNonuniformDouble

This provides a random sample of a double in a range, without accounting for the non-uniform distribution of IEEE double representation. This means that values closer to high-precision areas of the IEEE spec will be weighted higher in the output. However, NaN and positive and negative infinity are filtered out via oversampling. Results are still stable for a given input value.

HashedRangedToNonuniformFloat

This provides a random sample of a double in a range, without accounting for the non-uniform distribution of IEEE double representation. This means that values closer to high-precision areas of the IEEE spec will be weighted higher in the output. However, NaN and positive and negative infinity are filtered out via oversampling. Results are still stable for a given input value.

HashedToByteBuffer

Hash a long input value into a byte buffer, at least length bytes long, but aligned on 8-byte boundary;

Identity

Simply returns the input value. This function intentionally does nothing.

Interpolate

Return a value along an interpolation curve. This allows you to sketch a basic density curve and describe it simply with just a few values. The number of values provided determines the resolution of the internal lookup table that is used for interpolation. The first value is always the 0.0 anchoring point on the unit interval. The last value is always the 1.0 anchoring point on the unit interval. This means that in order to subdivide the density curve in an interesting way, you need to provide a few more values in between them. Providing two values simply provides a uniform sample between a minimum and maximum value. The input range of this function is, as many of the other functions in this library, based on the valid range of positive long values, between 0L and Long.MAX_VALUE inclusive. This means that if you want to combine interpolation on this curve with the effect of pseudo-random sampling, you need to put a hash function ahead of it in the flow. Developer Note: This is the canonical implementation of LERPing in NoSQLBench, so is heavily documented. Any other LERP implementations should borrow directly from this, embedding by default.

Join

This takes any collection and concatenates the String representation with a specified delimiter.

JoinTemplate

Combine the result of the specified functions together with the specified delimiter and optional prefix and suffix.

ListTemplate

Create a {@code List} based on two functions, the first to determine the list size, and the second to populate the list with string values. The input fed to the second function is incremented between elements.

LoadElement

Load a value from a map, based on the injected configuration. The map which is used must be named by the mapname. If the injected configuration contains a variable of this name which is also a Map, then this map is referenced and read by the provided variable name.

LongToString

Return the string representation of the provided long. @deprecated use ToString() instead

MatchFunc

Match any input with a regular expression, and apply the associated function to it, yielding the value. If no matches occur, then the original value is passed through unchanged. Patterns and functions are passed as even,odd pairs indexed from the 0th position. Instead of a function, a String value may be provided as the associated output value.

MatchRegex

Match any input with a regular expression, and apply the associated regex replacement to it, yielding the value. If no matches occur, then the original value is passed through unchanged. Patterns and replacements are passed as even,odd pairs indexed from the 0th position. Back-references to matching groups are supported.

Max

Return the maximum of either the input value or the specified max.

Min

Return the minimum of either the input value or the specified minimum.

Mod

Return the result of modulo division by the specified divisor.

ModuloCSVLineToString

Select a value from a CSV file line by modulo division against the number of lines in the file. The second parameter is the field name, and this must be provided in the CSV header line as written.

ModuloLineToString

Select a value from a text file line by modulo division against the number of lines in the file.

ModuloToInteger

Return an integer value as the result of modulo division with the specified divisor.

ModuloToLong

Return a long value as the result of modulo division with the specified divisor.

Mul

Return the result of multiplying the specified value with the input.

Murmur3DivToLong

Yield a long value which is the result of hashing and modulo division with the specified divisor.

Murmur3DivToString

Yield a String value which is the result of hashing and modulo division with the specified divisor to long and then converting the value to String.

NumberNameToString

Provides the spelled-out name of a number. For example, an input of 7 would yield "seven". An input of 4234 yields the value "four thousand thirty four". The maximum value is limited at 999,999,999.

PartitionLongs

Split the value range of Java longs into a number of offsets, starting with Long.MIN_VALUE. This method makes it easy to construct a set of offsets for testing, or to limit the values used a subset. The outputs will range from Long.MIN_VALUE (-2^63) up. This is not an exactly emulation of token range splits in Apache Cassandra.

Prefix

Add the specified prefix String to the input value and return the result.

ReplaceAll

Replace all occurrences of the extant string with the replacement string.

ReplaceRegex

Replace all occurrences of the regular expression with the replacement string. Note, this is much less efficient than using the simple ReplaceAll for most cases.

Shuffle

This function provides a low-overhead shuffling effect without loading elements into memory. It uses a bundled dataset of pre-computed Galois LFSR shift register configurations, along with a down-sampling method to provide amortized virtual shuffling with minimal memory usage. Essentially, this guarantees that every value in the specified range will be seen at least once before the cycle repeats. However, since the order of traversal of these values is dependent on the LFSR configuration, some orders will appear much more random than others depending on where you are in the traversal cycle. This function *does* yield values that are deterministic.

SignalPID

This function provides the current NB process identifier. Primarily used when NB is used as a signal agent.

SignedHash

This uses the Murmur3F (64-bit optimized) version of Murmur3, not as a checksum, but as a simple hash. It doesn't bother pushing the high-64 bits of input, since it only uses the lower 64 bits of output. Unlike the other hash functions, this one may return positive as well as negative values.

StaticStringMapper

Return a static String value.

Suffix

Add the specified prefix String to the input value and return the result.

SumFunctions

Compute the sum of a set of functions.

Template

Creates a template function which will yield a string which fits the template provided, with all occurrences of {} substituted pair-wise with the result of the provided functions. The number of {} entries in the template must strictly match the number of functions or an error will be thrown. If you need to include single quotes or other special characters, you may use a backslash "\" in your template. The objects passed must be functions of any of the following types:

The result of applying the input value to any of these functions is converted to a String and then stitched together according to the template provided.

@param iterOp A pre-generation value mapping function @param template A string template containing

{}
anchors @param funcs A varargs length of LongFunctions of any output type

ThreadNumToInteger

Matches a digit sequence in the current thread name and caches it in a thread local. This allows you to use any intentionally indexed thread factories to provide an analogue for concurrency. Note that once the thread number is cached, it will not be refreshed. This means you can't change the thread name and get an updated value.

ThreadNumToLong

Matches a digit sequence in the current thread name and caches it in a thread local. This allows you to use any intentionally indexed thread factories to provide an analogue for concurrency. Note that once the thread number is cached, it will not be refreshed. This means you can't change the thread name and get an updated value.

ToBase64

Takes a bytebuffer and turns it into a base64 string

ToHashedUUID

This function provides a stable hashing of the input value to a version 4 (Random) UUID.

Token

Trim

Trim the input value and return the result.

WeightedLongs

Provides a long value from a list of weighted values. The total likelihood of any value to be produced is proportional to its relative weight in the total weight of all elements. This function automatically hashes the input, so the result is already pseudo-random.

WeightedStrings

Allows for weighted elements to be used, such as a:0.25;b:0.25;c:0.5 or a:1;b:1.0;c:2.0 The unit weights are normalized to the cumulative sum internally, so it is not necessary for them to add up to any particular value.

WeightedStringsFromCSV

Provides sampling of a given field in a CSV file according to discrete probabilities. The CSV file must have headers which can be used to find the named columns for value and weight. The value column contains the string result to be returned by the function. The weight column contains the floating-point weight or mass associated with the value on the same line. All the weights are normalized automatically.

If there are multiple file names containing the same format, then they will all be read in the same way.

If the first word in the filenames list is 'map', then the values will not be pseudo-randomly selected. Instead, they will be mapped over in some other unsorted and stable order as input values vary from 0L to Long.MAX_VALUE.

Generally, you want to leave out the 'map' directive to get "random sampling" of these values.

This function works the same as the three-parametered form of WeightedStrings, which is deprecated in lieu of this one. Use this one instead.

Back to top