Skip to main content

Router Benchmarking

This measures load by increasing LoRaWAN traffic in steps on a Helium Router to find its threshold of capacity for a given server instance.

Measuring capacity of a particular server instance may then be compared with another server instance with different hardware specs. The procedure used is load & capacity measurement but some people also refer to this as stress testing or performance testing.

This document also gives insight to interactions of routers, gateways, organizations, devices and wallets in production.

For these instructions, each dependency is installed beneath ~/helium/, and Bash shell syntax on Linux is used below.

Order of operations is significant.

Challenges can arise due to the nature of simulation, much like watching the shadows of Plato's Allegory of the Cave rather than seeing the true cause. Therefore, any defects observed or suspected should be flagged by the community-- please! Then we all benefit from the corrections.

Dependencies

These must be isolated instances dedicated to load & capacity testing, as other users of the system will be negatively impacted as well as yield meaningless benchmarks:

  • router
    • Must be an isolated instance because abuse-prevention measures get triggered
    • Set environment variables: ROUTER_HOTSPOT_REPUTATION_ENABLED=false and/or ROUTER_HOTSPOT_REPUTATION_THRESHOLD to a large integer
    • Set environment variable: ROUTER_SC_OPEN_DC_AMOUNT=1000000 or higher integer value for budget per State Channel
    • By using an ed25519 key for router's wallet, it can be also used for hotspot add and location assertion
  • console
    • Set environment variable: IMPOSE_HARD_CAP=false
    • Ensure MAX_DEVICES_IN_ORG exceeds number of virtual devices

Install wallet on same host as your router if sharing its private key:

  • helium-wallet
    • Use a wallet with minimal funding as assurance to avoid overspending

These may be run locally on your laptop/workstation within their respective build subdirectories:

Additional utilities, possibly already installed on your system:

Aliases & Vars

Our working directory is ~/helium/benchmark/.

WORKDIR=~/helium/benchmark

Convenience aliases:

alias helium-console-cli="~/helium/helium-console-cli/target/release/helium-console-cli"
alias helium_gateway="~/helium/gateway-rs/target/release/helium_gateway"
alias helium-wallet="~/helium/helium-wallet-rs/target/release/helium-wallet"
alias gwmp-mux="~/helium/gwmp-mux/target/release/gwmp-mux"
alias virtual-lorawan-device="~/helium/virtual-lorawan-device/target/release/virtual-lorawan-device"

Define base URLs as shell variables:

# development versus production, respectively:
API_URL=https://api.helium.wtf
API_URL=https://api.helium.io

# Console corresponding to router-dev:
CONSOLE_URL="https://helium-console-dev.herokuapp.com"

# router-dev:
ROUTER_URL="http://54.176.88.149:8080"
# Optionally defined via script below:
ROUTER_OUI=2

Convenience variables:
(After a few preliminary runs, significantly increase the following values)

# number of organizations:
N_ORGS=2

# number of gateways:
N_GATEWAYS=2

# number of devices per org:
N_DEVICES=4

# For priming each loop so subsequent runs can add
# more orgs/devices/gws without clobbering:

FIRST_ORG=1
FIRST_GW=1
FIRST_DEV=1

(If going above double digits for gateways or triple digits for devices, the scripts will need to be revised; see Bash printf and seq --format expressions below.)

Additional variables will be created along the way.

Logs & Graphs

Benchmarking load is meaningless without the ability to measure, and in order to optimize server capacity requires comparing meaningful, consistent measurements.

Graphs described below help on both points.

Subtle nuances to comprehend first:

  • Normal has multiple meanings:
    1. colloquially speaking, it's that which is healthy
    2. in statistics it's the set of values centered within a particular distribution such as a bell curve
  • Nominal also has multiple meanings:
    • Within acceptable limits, synonymous with healthy operation
    • Say nominal instead of normal in context of Load & Capacity for clarity because inbound traffic fluctuates, constantly altering the baseline

The router repo includes the Grafana dashboard configuration used in production.

For router under significant load, the most critical graphs to watch:

  • Ensure Last Block nominally remains within single digit minutes
    • Spikes of 10-20 minutes are generally recoverable
    • Sustaining 15 minutes behind should be cause for alarm to take action
  • Active State Channels on a server routing traffic nominally has at least 1 and often 5-20
    • A "state channel" facilitates off-block transactions which get persisted to the Blockchain as a single transaction
    • The term "state channel" is the generic term for a Layer-2 Ledger (like the "Lightning Network") and critical to router's use of the Helium Blockchain
    • An "actor" is a co-routine in Erlang (or "Green Threads", or similar in purpose to "async/await" with differing mechanisms)
  • Offer -> Packet -> Downlink travel times:
    • Ideally would be within milliseconds (ms)
    • Acceptable when below 2 seconds
    • Alarm and take action when above 5 seconds
  • Offer Duration should be within milliseconds (ms)
    • This translates directly to latency of server response
    • Excludes any Internet traffic latency
  • Others have values and trends that vary based upon what is being measured and are described below:
    • Join Offer Rate
    • Packet Offer Rate
    • Offer Rejected Reasons

For designing your own graphs, it helps bucketing by percentiles. Common buckets are: 1%, 5%, 15%, 50%, 85%, 90%, 95%, 97%, 98%, 99% 100%, which give the average/mean, an approximation of standard deviation (15%, 85%) and most importantly the outliers. Outliers are skewed here towards the troublesome high end for fine-grained tracking. The 99th and 100th may be safely ignored when those values are indistinguishable from latency across the Internet, and such outliers are generally due to periodic garbage-collection of Erlang's underlying BEAM virtual machine.

Ensure those graphs exist before load & capacity tests begin.

Manual Setup

Generate a Console API Key, which requires logging onto Console user interface (UI). Everything else will be done by command-line interface (CLI).

Each organization will have its own API Key, which may be obtained via the Console UI under Account Settings.

Store your new Console API Key generated from the UI to the TOML file below, and specify base URL of the console. The default of https://console.helium.com is probably not what you want for purposes here. This URL must match the Console corresponding to your Router instance.

If using multiple organizations, create individual configuration files within each org's subdirectory; e.g., org-01, org-02, respectively.

for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
  mkdir $WORKDIR/org-$o
done

Persist each org's API key and Console base URL in its respective TOML file:
(Note the dot preceding the file name)

cd $WORKDIR/org-01/

out=.helium-console-config.toml
# Be sure to use *your* assigned key:
echo 'key = "AlphaNumericString..."' > $out
echo 'base_url = "'${CONSOLE_URL}'"' >> $out
echo 'request_timeout = 120'         >> $out

Repeat for each of the multiple organizations to be used.

Confirm getting valid JSON as a result:

for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
  cd $WORKDIR/org-$o/
  helium-console-cli device list
done

With valid JSON as the result, preliminary installation is complete.

Everything else may be scripted; however, creating your first device via Console UI may be easier than remembering how to create a unique EUI or app key.

Create & Fund Wallet

Use or create a new wallet with minimal funding as assurance to avoid overspending.

mkdir $WORKDIR/wallet
cd $WORKDIR/wallet/

helium-wallet create basic

WALLET=$(helium-wallet info | grep Address | awk '{print $4}')
echo $WALLET

Ensure enough Data Credits (DC) to cover cost of adding data-only hotspots and asserting location of each.

e.g., For 20 instances of gateway-rs: 20 x (1000000 + 500000) at HNT prices in 2022Q2, this will require 15 HNT. Plus add for uplinks from devices if on a non developer instance of router.

Provide the equivalent of the following command to whomever has the main wallet from which the limited wallet gets funded:

Please add funds for Load & Capacity testing:
helium-wallet burn --amount 25 --payee $WALLET --commit

Again, burning HNT is only for asserting hotspots and their locations.

When benchmarking within an isolated environment such as router-dev, reset DC balance on the router host which is described in Fund Organizations below.

Optional: Sharing Keys With Router

Only applies when router's wallet uses ed25519 keys:

Since the Load & Capacity activities will likely be using a development instance of a router, you may use same private key for the wallet: /var/data/blockchain/swarm_key

Router can generate a file containing both public and private keys.

Evaluate the following on router host via router remote_console command:

SwarmKey = filename:join([
    application:get_env(blockchain, base_dir, "data"),
    "blockchain",
    "swarm_key"
  ]),
  {ok, RouterKeys} = libp2p_crypto:load_keys(SwarmKey),
#{public := RouterPubKey, secret := RouterPrivKey} = RouterKeys.

%% View octets of private key as Erlang Binary:
io:format("~P", [libp2p_crypto:keys_to_bin(RouterKeys), 999]).

%% Using any other file path creates INSIDE docker container:
libp2p_crypto:save_keys(RouterKeys, "/var/data/wallet.key").

If output from above indicates that it's an ECC Compact key, that key file is incompatible with helium-wallet due to using a different elliptic curve.

The private key to be used by helium-wallet command will be in /var/data/wallet.key file.

WALLET=$(helium-wallet -f /var/data/wallet.key info | \
         grep Address | \
         awk '{print $4}')
echo $WALLET

That shared wallet is ready for use below.

Router Address

Determine address of the router that you are using by running this one command on router host:

ssh router-dev
cd router/
router peer addr

Extract the sequence following /p2p/ from the output.

That long alphanumeric value gets used in multiple places below as the $P2P variable.

Define this on your laptop/workstation:

P2P=longAlphanumeric...
echo $P2P

Hexagons

Choose a hexagon for each gateway instance.

Because this is a test router yet each gateway's location must be asserted onto mainnet blockchain, pick a bogus location such as middle of a lake or river. (While there could be legitimate devices in a particular body of water, there is unlikely to be a hotspot there as well.)

The importance of asserting a location for each hotspot:

Without any location asserted for a particular gateway, the distance calculated to router defaults to 1000 as defined within distance_between() in router_device_devaddr.erl which may be preferable to triangulating based upon physical location.

However, relying upon this default will bypass much program logic which involves a blockchain ledger lookup, thereby yielding artificially higher throughput.

A dedicated and isolated router and console pair may disregard concerns about Proof-of-Coverage (PoC) or an oversaturated Hex.

(Router uses h3 hexagonal hierarchical geospatial indexing system for computing distance, so dig there for any unique criteria or constraints. For previewing the hex for each pair of coordinates, use the live JavaScript feature of the underlying API's documentation with resolution of 8, and append the resulting value to: https://explorer.helium.com/hotspots/hex/.)

Select latitude and longitude:

LAT="43.5482302"
LON="-116.6575001"
echo $LAT $LON

This pair of latitude and longitude values will be used below.

Create Devices via Console API

Each device across all organizations must have unique Dev EUI, App EUI and App Key.

For reference, see LoRaWAN Link Layer Specification sections as of v1.0.4:

  • 6.2.1 End-device identifier (DevEUI)
  • 6.2.3 Application key (AppKey)
    • and its footnote which implies the uniqueness constraint

By using an isolated instance of router, we may generate identifiers:

cd $WORKDIR

for org in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
  for dev in $(seq --format="%03g" $FIRST_DEV $N_DEVICES); do
    (cd org-$org && \
     helium-console-cli device create \
       a0000000000$org$dev \
       600000000000000000000000000$org$dev \
       d0000000000$org$dev \
       benchmark-$dev || echo "org=$org device=$dev" >> errors.log)
  done
done

Confirm:

for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
  cd $WORKDIR/org-$o/
  helium-console-cli device list > devices.json
done

Returns JSON, which when formatted via jq . resembles the following.

{
  "devices": [
    {
      "app_eui": "a000000000001001",
      "app_key": "60000000000000000000000000001001",
      "dev_eui": "d000000000001001",
      "id": "7cbc7f9d-0205-4f85-ab0a-34de8c4d8f52",
      "name": "benchmark-001",
      "organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
      "oui": 2
    },
    {
      "app_eui": "a000000000001002",
      "app_key": "60000000000000000000000000001002",
      "dev_eui": "d000000000001002",
      "id": "81d39917-b772-4937-b78b-618f4461340f",
      "name": "benchmark-002",
      "organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
      "oui": 2
    },
    {
      "app_eui": "a000000000001003",
      "app_key": "60000000000000000000000000001003",
      "dev_eui": "d000000000001003",
      "id": "cb82bc53-49ec-4401-bed0-2c4010f26c1d",
      "name": "benchmark-003",
      "organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
      "oui": 2
    },
    {
      "app_eui": "a000000000001004",
      "app_key": "60000000000000000000000000001004",
      "dev_eui": "d000000000001004",
      "id": "e7cca34d-5234-457f-ae92-8037c2649569",
      "name": "benchmark-004",
      "organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
      "oui": 2
    }
  ]
}

Confirm consistent number of devices generated across organizations:

wc -l org-*/devices.json

The value of "oui" in output above should correspond to your value for $ROUTER_OUI.

ROUTER_OUI=$(jq '.devices[0].oui' org-01/devices.json)
echo $ROUTER_OUI

Virtual Devices

This requires devices already having been created via Console API.

The API provides the single source of truth about devices. (Device ID in Console might be mismatched with respect to identifiers generated below due to sequence of output from API, which is unimportant for benchmarking.)

Despite lack of plural in the name, virtual LoRaWAN device accommodates multiple devices, as indicated by its README.

Generate a single settings.toml file for all devices per organization. (This segments by org for managing benchmarking runs.)

Connections of devices from same org will distribute across multiple gateways, simulating real world scenarios with geographically dispersed deployments.

Sending more frequently than 2 seconds per device would be unrealistic. The value for secs_between_transmits may be tuned below, and rx_delay which defaults to 1s, may be increased via Console up to 15s.

cd $WORKDIR
# Use printf, not `seq --format` to avoid octal when assigning $i below
for o in $(seq $N_ORGS); do
  org=$(printf "%02g" $o)
  [ -d org-$org/settings ] || mkdir -p org-$org/settings
  [ -e org-$org/settings/default.toml ] || \
    ln -s ~/helium/virtual-lorawan-device/settings/default.toml \
      org-$org/settings/

  out="org-$org/settings/settings.toml"
  echo 'default_server = "dev"'                > $out
  # Must be unique across all instances of virtual-lorawan-device:
  metrics_port=$((9898 + o))
  echo 'metrics_server = "127.0.0.1"'          >> $out
  echo 'metrics_port = '$metrics_port          >> $out
  echo ''                                      >> $out

  # Multiplexer sprays across all gateways:
  echo "[packet_forwarder.mux]"                >> $out
  echo 'host = "localhost:1680"'               >> $out
  echo 'mac = "0807060504030200"'              >> $out
  echo ''                                      >> $out

  # Optional, if you want to bypass multiplexer:
  # uncomment this block and use packet_forwarder=gw below.
  #for gw in $(seq $N_GATEWAYS); do
  #  gateway=$(printf "%02g" $gw)
  #  echo "[packet_forwarder.gw${gateway}]"     >> $out
  #  port=$((1680 + gw))
  #  echo 'host = "localhost:'${port}'"'        >> $out
  #  echo 'mac = "08070605040302'${gateway}'"'  >> $out
  #done

  json="org-$org/devices.json"
  for d in $(seq $N_DEVICES); do
    device=$(printf "%03g" $d)
    echo ''                                    >> $out
    echo "[device.${device}]"                  >> $out
    seconds=$((20 + d % 20))
    echo 'secs_between_transmits = '$seconds   >> $out
    frames=$((120 - d % 40))
    echo 'rejoin_frames = '$frames             >> $out

    echo 'packet_forwarder = "mux"'            >> $out
    # Optional round-robin allocation:
    #i=$(( (d % N_GATEWAYS) + FIRST_GW ))
    #gateway=$(printf "%02g" $i)
    #echo '#packet_forwarder = "gw'${gateway}'"' >> $out
    echo "[device.${device}.credentials]"      >> $out
    j=$((d - 1))
    echo 'dev_eui = '$(jq .devices[$j].dev_eui < $json) >> $out
    echo 'app_eui = '$(jq .devices[$j].app_eui < $json) >> $out
    echo 'app_key = '$(jq .devices[$j].app_key < $json) >> $out
  done
done

As seconds between transmits increases, number of frames until re-joins decreases for leveling overall traffic (reducing spikes and eliminating artificial surges).

Some packet_forwarder lines are intentionally commented-out above, because there might be occasions for connecting to a specific gateway directly rather than using the multiplexer.

Sanity check:

grep -q 'dev_eui = null' org*/settings/settings.toml && \
  echo Something went wrong.

Devices may require waiting upwards of 20 minutes (concurrently) for the router's XOR filter to run. The device will be marked in Console with "Pending" status until then, and there is an API call for this below.

Later when running the actual test, check logs of virtual-lorawan-device for entries containing the following excerpt:

WARN ... UDP packet received after tx time ...

Significantly more than 1:100 ratio indicates the value for default_secs_between_transmits should be increased. That ratio accommodates interference patterns from the scheduler in Linux and the Tokio async/await framework used within the Rust app, or it can be from the hypervisor when on a virtualized server instance.

The metrics API will be used below to track successful Joins.

Configure Gateways

This uses the $P2P value defined above in "Router Address" section.

Each gateway must open a listening socket on a unique port number, so this simply increments from the default base port.

Install configuration files:

cd $WORKDIR

i=${FIRST_GW:-1}
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  mkdir gw-$gw
  ln -s ~/helium/gateway-rs/config/default.toml gw-$gw/

  out=gw-$gw/settings.toml
  keyfile="$(pwd)/gw-$gw/gateway_key.bin"
  echo 'keypair = "'${keyfile}'"'       > $out
  # Must be unique per gw and consistent with device configs:
  port=$((1680 + i))
  echo 'listen = "127.0.0.1:'${port}'"' >> $out
  # Only needs to be unique:
  port=$((4467 + i))
  echo "api = $port"                    >> $out
  echo 'region = "US915"'               >> $out
  echo ''                               >> $out
  echo '[update]'                       >> $out
  echo 'enabled = false'                >> $out
  echo ''                               >> $out
  echo '[router.release]'               >> $out
  echo 'pubkey = "'${P2P}'"'            >> $out
  echo 'uri = "'${ROUTER_URL}'"'        >> $out
  echo 'oui = '${ROUTER_OUI}            >> $out
  echo ''                               >> $out
  # Uncomment for high traffic runs:
  echo '# [log]'                        >> $out
  echo '# level = "warn"'               >> $out
  i=$((i + 1))
done

Note: If using "beta" or other version of gateway-rs instead of "release", change [router.release] to match.)

For benchmark runs with very high traffic, uncomment the [log] section above, because there's a measurable cost of I/O due to logging that can negatively impact the host sending packets.

Add Gateways To Blockchain

Generate transactions using WALLET variable defined earlier. Each transaction gets committed later.

When in server-mode for the first time, gateway-rs needs time to generate key material and connect to upstream services, so start all concurrently:

for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  helium_gateway -c gw-$gw server > gw-$gw/server.log &
done

Wait until all log files contain at least one entry for height, which implies that each has connected to its upstream validator:

# Wait:
grep -c height gw-$gw/server.log

Generate transactions, which will be signed and committed later:

for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  helium_gateway -c gw-$gw info > gw-$gw/gw-info.json

  helium_gateway -c gw-$gw \
    add --mode dataonly \
      --owner $WALLET \
      --payer $WALLET > gw-$gw/add-dataonly.json
done

Extract each txn value from the previous output and strip quotes from value:

jq .txn gw-*/add-dataonly.json | \
   sed 's/"//g' > txn-ids.log

Prepare command to add each new gateway based upon previous transactions:
(if wallet is on router host, run it there)

for txn in $(cat txn-ids.log); do
   # Prompts for wallet password
   helium-wallet \
     -f $WORKDIR/wallet/wallet.key \
     hotspots add $txn
done

Ensure there were no errors in the preceding output, and repeat same command adding --commit flag to make it reality.

This burns HNT from your wallet:

for txn in $(cat txn-ids.log); do
   # Burn HNT from your wallet:
   helium-wallet \
     -f $WORKDIR/wallet/wallet.key \
     hotspots add $txn \
     --commit
done

Assert Location For Each Gateway

Assert location using $LAT and $LON defined earlier for each gateway.

for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  key=$(jq .key gw-$gw/gw-info.json | sed 's/"//g')

  helium-wallet \
     -f $WORKDIR/wallet/wallet.key \
     hotspots assert --gateway $key \
     --mode dataonly \
     --lat=$LAT$gw --lon=$LON$gw --elevation 6
done

Repeat with --commit flag, which burns HNT from your wallet:

for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  key=$(jq .key gw-$gw/gw-info.json | sed 's/"//g')

  helium-wallet \
     -f $WORKDIR/wallet/wallet.key \
     hotspots assert --gateway $key \
     --mode dataonly \
     --lat=$LAT$gw --lon=$LON$gw --elevation 6 \
     --commit
done

Confirm Locations

Allow time-- approximately 5-20 minutes-- for those transactions to be recorded onto the blockchain, and then confirm asserted locations via blockchain API:

curl --user-agent "$UA" \
  "$API_URL/v1/hotspots/location/distance?lat=$LAT&lon=$LON&distance=1000"

The distance parameter above is in meters, so adjust its value if necessary.

Ensure that your gateways appear within those results. If not found, wait several minutes and check again later.

Other gateways appearing within the results is benign because a virtual LoRaWAN device will only reach those for which it was explicitly configured since its radio is simulated.

With that, your gateways are ready for Load & Capacity runs.

Fund Organizations

Benchmarking on an isolated server instance of router (e.g., router-dev) affords the option of resetting DC balance of an organization. Otherwise, benchmarking will make a measurable impact to your wallet; see Data Credits within Console documentation.

First, find your organization_id which may be obtained via Console UI. From the Organizations page, there is an export widget associated with each organization listed.

Alternatively, extract from devices.json for each organization. Those files were generated above.

Run:

jq .devices[0].organization_id org-*/devices.json

Use each of those organization IDs and provide the target DC balance, which gets specified as 100k in this example:

ssh router-dev
cd router/
router organization update bb5a2fec-1234-5678-89e4-a0e3c4807628 -b 100000

Confirm output of the above command, and then repeat with --commit flag:

!! --commit

Again, the above command only works on non-production environments.

Wait for XOR Filter

The Console may indicate "Pending" from having added new devices above and may take upwards of 20 minutes.

Wait until that message clears before proceeding.

There is an API call for that status:

for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
  cd $WORKDIR/org-$o/
  helium-console-cli devices all > status.json
done

If any report as false, wait one minute at minimum before checking again to avoid being throttled or blocked:

cd $WORKDIR
jq '.[].in_xor_filter' org-*/status.json \
  | grep false && sleep 60

Start Gateways

Launch each gateway "server" as a separate process, and let the OS handle concurrency:

cd $WORKDIR

for gw in $(seq --format="%02g" $N_GATEWAYS); do
  helium_gateway -c gw-$gw server >> gw-$gw/run.log 2>&1 &
done

Allow several minutes for each gateway concurrently connecting to validators and the router specified in settings.toml file.

Wait until each hotspot reports height of the blockchain:

for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
  helium_gateway -c gw-$gw info > gw-$gw/gw-info.json
done
jq .gateway.height gw-*/gw-info.json || echo "Wait!"

Only after each gateway reports a large integer for height, continue.

Due to backoff and retry, each delay takes an increasing amount of time with random variation:

 INFO selecting new gateway in 36s, module: dispatcher
 ...
 INFO selecting new gateway in 69s, module: dispatcher
 ...
 INFO selecting new gateway in 152s, module: dispatcher
 ...

 INFO selecting new gateway in 1826s, module: dispatcher

Warnings about deadlines logged between retries can be safely ignored:

 WARN gateway stream setup error: ...
   "error trying to connect: deadline has elapsed" ...

Note:

A gateway may disconnect from its upstream validators or router and automatically reconnect. When that occurs, the warnings and retries described above may appear at that time.

One or two retries should be just a blip on the graphs, but extended retries might be cause for invalidating the affected run for purposes of benchmarking.

Start Multiplexers

This sends every Join request and packet to all gateways.

Device configurations performed earlier per organization already account for using the multiplexer.

Run in its own Terminal window:
(running in background to be accommodated in a future release)

port=1681
GW_LIST=''
for gw in $(seq 2 $N_GATEWAYS); do
  port=$((port + 1))
  GW_LIST="${GW_LIST} 127.0.0.1:$port"
done

gwmp-mux --host 1680 --client $GW_LIST > mux.log 2>&1

All gateways will receive all Join and Uplink requests.

Spawn Virtual Devices

Launch fleet of virtual devices as a single OS process:

cd $WORKDIR

#export RUST_LOG=warn
for o in $(seq --format="%02g" $N_ORGS); do
  virtual-lorawan-device \
    --settings org-$o/settings/ >> org-$o/run.log 2>&1 &

  # Slight mitigation against hammer effect:
  sleep 5
done

While waiting for the outer loop to finish, watch your graphs; e.g, Grafana.

Optionally set RUST_LOG=debug environment variable for verbose logs.

Note: virtual-lorawan-device causes the device to Join (or re-join) upon each start of the utility. (For continuing a previous session, pull-requests there are welcomed!)

Watch Devices

The metrics_port associated with each running instance of virtual-lorawan-device may be queried via HTTP GET. For bulk requests:

metrics() {
  time=$(date +%Y-%m-%dt%H:%M:%S)
  for o in $(seq --format="%02g" $N_ORGS); do
    port=$(grep metrics_port $WORKDIR/org-$o/settings/settings.toml | \
           awk '{print $3}')
    curl http://localhost:$port/ > org-$o-stats-$time.txt
  done
}

Periodically run:

metrics

Useful statistics to observe while everything is running may be determined by counting the following log entries for determining success ratios sampled over duration of each benchmarking run.

Logs of virtual-lorawan-device:

  • "No Join Accept Received"
  • "sending packet"
  • "RxWindow expired, expected ACK to confirmed uplink not received"

Logs from gateway-rs for each packet indicate that the packet has been queued but not necessarily delivered to router.

Watch A Single Device

This is an optional debugging step.

While the fleet of virtual-lorawan-device instances runs, watch a single device.

First, run the loop from Wait for XOR Filter above again to re-populate status.json for each organization.

Find devices that indicate a potential problem.

Device IDs that never connected:

jq '.[]|select(.last_connected|.==null)|.id' org-*/status.json

Devices that connected longest ago:

jq '.|sort_by(.last_connected)|.[0]'  org-*/status.json

Device IDs with fewest packets:

jq '.|sort_by(.total_packets)|.[0].id' org-*/status.json

Use those Device IDs for tracing via router CLI:

ssh router
cd router/
router device trace --id=8db30307-1234-5678-9abc-511376ac301b

Each trace expires after 240 minutes.

Stop

Generally allow this fleet of virtual devices to run for 15-20 minutes, which gives ample traffic that the Erlang BEAM virtual machine's garbage-collector will have run, etc.

When you have enough data:

killall virtual-lorawan-device
killall helium_gateway

(Those must be names of each actual command-- not Bash aliases.)

Confirm no longer running:

ps aux | grep virtual-lorawan-device
ps aux | grep helium_gateway

Iterate

To ensure sufficient load, you may need to run again with increased values for:

  • number of organizations (N_ORGS)
  • number of devices (N_DEVICES)
  • number of gateways (N_GATEWAYS)

When doing so, also increase corresponding values for FIRST_* to be the former value of N_* to avoid clobbering or creating duplicate devices.

Check for new releases of virtual-lorawan-device that may facilitate driving higher traffic. Future work will occur there.

Comprehending The Results

For measuring number of devices within a devaddr slab:

Ensure "Join Offer Rate" graph shows step-wise increases, and "Packet Offer Rate" graph resembles a plateau. Then, watch "Offer Rejected Reasons" graph and stop when seeing an increase in devaddr_no_device.

With disambiguation due in part to the Message Integrity Check (MIC), the usable number will be higher than apparent capacity; thus, an 8 slab accommodates more than eight devices. Actual limits may vary with Router and/or Console configurations used.

For measuring traffic load:

There should be a slow ramp-up of traffic such that we "find the knee" of the graph before hitting the plateau (ceiling). Understand this plateau being an approximation due to the nature of generated traffic.

A good rule-of-thumb:

Plan for 70% CPU utilization for nominal load, which provides a cushion for traffic spikes without being excessively oversized.

Repeatable Results

When confirming any such measurements:

Run the same scenario a minimum of three consecutive times with consistent results.

That excludes a preliminary run to invalidate the OS file-system cache.

This rule of three requires consistency. If results vary by more than rounding error, keep running until getting 3 with similar results.

If the only consistency is that results vary widely, look to other artifacts on the host OS that might be causing interference. Consider migrating to different host server hardware (via hard stop/start) if running on a virtualized/hypervisor environment such as AWS, GCP, Linode, etc.

Items that we addressed:

  • On Ubuntu Server, prevent Snap from updating during a benchmark run:
    • sudo snap list and sudo apt-get purge snapd
  • If graphs are inconsistent, confirm each gateway is still connected to its upstream Validator:
    • Confirm "Join Offer Rate" graph shows step-wise increases
    • Confirm "Packet Offer Rate" graph resembles a plateau