Categorisation Rules

Categorisation rules tell SpeyBooks how to automatically assign bank transactions to accounts in your chart of accounts. When a bank statement is imported, the engine evaluates every rule against each transaction and suggests the best matching account.

This guide covers creating rules, understanding the scoring engine, working with global rules, and managing priority order. For the full endpoint reference, see Categorisation Rules API.


How it works

Each rule consists of three parts: a field to inspect (the transaction description, contact name, amount, reference, or metadata), an operator for matching (contains, equals, starts_with, regex, etc.), and a target account where matching transactions should be categorised.

When a bank transaction arrives, the engine tests it against all active rules. If multiple rules match, the lattice scoring engine ranks them by specificity, provenance, and priority to select the best one. The result includes an evidence trail explaining why a particular rule won.

Rules are evaluated in this order of precedence:

  1. Local rules - your organisation's rules, highest priority first
  2. Verified rules - rules that have been confirmed by repeated use
  3. Global rules - consensus rules aggregated across all SpeyBooks tenants

A local rule always takes precedence over a global rule for the same keyword, regardless of priority values.


Creating a rule

A rule requires a name, the field to match against, an operator, a value to match, and the targetAccountId for the destination account. Optionally set a priority (0-999, default 0) to control evaluation order within the same provenance level.

If you create a rule with the same field, operator, and value as an existing rule, the API upserts: the target account is updated rather than creating a duplicate.

Fields and operators

FieldDescriptionCompatible operators
descriptionTransaction description from bank statementcontains, equals, starts_with, ends_with, regex
contact_nameContact namecontains, equals, starts_with, ends_with, regex
amountTransaction amount in minor unitsgreater_than, less_than, equals
referencePayment referencecontains, equals, starts_with, ends_with, regex
metadata.categoryMetadata keycontains, equals
metadata.projectMetadata keycontains, equals
metadata.departmentMetadata keycontains, equals
metadata.tagMetadata keycontains, equals

Description normalisation

Description values are normalised before storage: converted to uppercase, common bank prefixes stripped (CARD PAYMENT, DIRECT DEBIT, etc.), and trailing digits removed. This means a rule with value BRITISH GAS will match CARD PAYMENT TO BRITISH GAS ENERGY REF 123456. You do not need to account for bank formatting variations in your rule values.


Testing rules

The test endpoint lets you preview what account a transaction would be categorised to without modifying any data. Pass a description (and optionally a contact name, amount, reference, or metadata) and the engine returns the best match with full evidence.

The evidence object in the response explains the match:

matchedField and operator identify which rule component matched. matchedValue shows the normalised value that triggered the match. provenance indicates the rule source: LOCAL for your organisation's rules, VERIFIED for confirmed rules, or GLOBAL for consensus rules. scoreVector is the internal lattice score used for ranking when multiple rules match.

When no rule matches, the response returns "matched": false with a typed unknownReason explaining why: NoMatch means no rule's pattern matched the input, AmbiguousTopScore means multiple rules matched with identical scores and the engine could not pick a winner. Use this to identify gaps in your rule set or conflicting rules that need priority adjustment.


Learning from imports

The categorise-row endpoint creates rules automatically from manual categorisation during a bank import. When you categorise a transaction in an import preview, this endpoint extracts a keyword from the description, normalises it, and creates (or upserts) a contains rule.

The key feature is the batchDescriptions parameter. Pass all the descriptions in the current import batch and the response includes similarRowIds - the row IDs of other transactions whose extracted keyword matches. This enables bulk categorisation: categorise one Tesco transaction and the engine identifies all other Tesco rows in the batch.

In the example, categorising row 101 ("TESCO STORES 4532") extracts the keyword "TESCO" and identifies row 105 ("TESCO STORES 6789") as a match. Row 108 ("GREGGS KILMARNOCK") does not match and is excluded from similarRowIds.

Each manual categorisation also emits a signal to the global rules aggregation pipeline. Over time, this builds the cross-tenant consensus that powers global rules.


Global rules

Global rules are consensus patterns aggregated across all SpeyBooks tenants. When many organisations categorise "TESCO" as Office Expenses, that pattern becomes a global rule available to everyone. This means new organisations get sensible suggestions from their first import, without needing to build a rule set from scratch.

The global rules endpoint returns each rule with social proof metadata:

confidence is a score from 0 to 1 reflecting how consistently tenants categorise this keyword to the same account. distinctTenants is the number of unique organisations that contributed to the rule. usageCount is the total number of times this categorisation has been applied.

Each global rule's stdCategory (a standard category like office_expenses) is automatically resolved against your chart of accounts via the M⁻¹ mapping function, returned as the resolvedAccount object. If the mapping cannot be resolved for your account structure, the rule is excluded from results. This means you never see global suggestions pointing to accounts that do not exist in your ledger.

Global rules only contribute suggestions when no local rule matches for a given keyword. You can always override a global suggestion by creating a local rule with the same keyword.


Priority and ordering

Rules are evaluated within each provenance level in priority order, highest first. Priority values range from 0 to 999.

When two local rules could both match the same transaction, the higher-priority rule wins. For example, a rule matching "AMAZON WEB SERVICES" at priority 90 will take precedence over a broader "AMAZON" rule at priority 50, even though both technically match.

The reorder endpoint lets you update multiple rules' priorities atomically. All updates are applied within a database savepoint, so either all priorities change or none do.

A common approach is to set specific rules (exact company names, account numbers) at higher priorities and general rules (broad keyword matches) at lower priorities. This way, a specific match always wins over a generic one.


Listing and managing rules

List all rules with optional filtering by active status. Rules are returned ordered by priority descending, then creation date ascending.

To deactivate a rule without deleting it, use PATCH /categorisation-rules/{id} with {"active": false}. Inactive rules are skipped during evaluation but remain in your rule set for reactivation later.

To delete a rule permanently, use DELETE /categorisation-rules/{id}. Transactions that were already categorised by a deleted rule are not affected.

Each organisation is limited to 100 rules. If you hit the limit, review your rule set for redundant or overlapping rules. The test endpoint can help identify which rules are actually matching.


The lattice scoring engine

When multiple rules match a transaction, the engine ranks them using a multi-dimensional score vector rather than a simple priority number. The vector components include:

DimensionComponentWhat it measures
1ProvenanceLocal > Verified > Global
2Operator strengthSpecificity of the operator: equals > starts_with > contains
3Field strengthReliability of the matched field: reference > metadata > description
4Pattern specificityLength of the matched value. For regex, wildcards are penalised (lambda=2) to prevent lazy catch-all patterns like .* from swallowing transactions
5Value fitRatio of match length to total input string length
6PriorityExplicit priority (0-999) set by the user
7Tie breakDeterministic fallback based on creation order

The comparison is lexicographic - evaluated left to right. A higher-order dimension always overrides a lower-order one. This means provenance always wins over priority: a local rule at priority 0 still beats a global rule at priority 999.


Worked example: setting up rules for a new import

A typical workflow for a first bank import:

  1. Upload the statement - POST /bank-imports/upload with your CSV file

  2. Check auto-suggestions - the engine runs all existing rules (local and global) against the imported rows. Rows with a match show a suggested category; rows without a match need manual attention

  3. Categorise manually - for uncategorised rows, use POST /categorisation-rules/categorise-row with the row's description and target account. The response creates a rule and identifies similar rows in the batch

  4. Review the rule set - GET /categorisation-rules?active=true to see all rules, check priorities, and verify the rules make sense

  5. Test edge cases - POST /categorisation-rules/test with descriptions you expect to see in future imports. Confirm the right rule matches and inspect the evidence

  6. Reorder if needed - POST /categorisation-rules/reorder to adjust priorities if specific rules are being overshadowed by broader ones

After the first import, subsequent imports benefit from the rules created during step 3. Each import gets progressively faster as the rule set grows.


Related endpoints