Router Benchmarking
This measures load by increasing LoRaWAN traffic in steps on a Helium Router to find its threshold of capacity for a given server instance.
Measuring capacity of a particular server instance may then be compared with another server instance with different hardware specs. The procedure used is load & capacity measurement but some people also refer to this as stress testing or performance testing.
This document also gives insight to interactions of routers, gateways, organizations, devices and wallets in production.
For these instructions, each dependency is installed beneath ~/helium/
,
and Bash shell syntax on Linux is used below.
Order of operations is significant.
Challenges can arise due to the nature of simulation, much like watching the shadows of Plato's Allegory of the Cave rather than seeing the true cause. Therefore, any defects observed or suspected should be flagged by the community-- please! Then we all benefit from the corrections.
Dependencies
These must be isolated instances dedicated to load & capacity testing, as other users of the system will be negatively impacted as well as yield meaningless benchmarks:
- router
- Must be an isolated instance because abuse-prevention measures get triggered
- Set environment variables:
ROUTER_HOTSPOT_REPUTATION_ENABLED=false
and/orROUTER_HOTSPOT_REPUTATION_THRESHOLD
to a large integer - Set environment variable:
ROUTER_SC_OPEN_DC_AMOUNT=1000000
or higher integer value for budget per State Channel - By using an ed25519 key for router's wallet, it can be also used for hotspot add and location assertion
- console
- Set environment variable:
IMPOSE_HARD_CAP=false
- Ensure
MAX_DEVICES_IN_ORG
exceeds number of virtual devices
- Set environment variable:
Install wallet on same host as your router if sharing its private key:
- helium-wallet
- Use a wallet with minimal funding as assurance to avoid overspending
These may be run locally on your laptop/workstation within their respective build subdirectories:
- gateway-rs
- helium-console-cli
- gwmp-mux LoRaWAN protocol multiplexer
- virtual-lorawan-device
Additional utilities, possibly already installed on your system:
Aliases & Vars
Our working directory is ~/helium/benchmark/
.
WORKDIR=~/helium/benchmark
Convenience aliases:
alias helium-console-cli="~/helium/helium-console-cli/target/release/helium-console-cli"
alias helium_gateway="~/helium/gateway-rs/target/release/helium_gateway"
alias helium-wallet="~/helium/helium-wallet-rs/target/release/helium-wallet"
alias gwmp-mux="~/helium/gwmp-mux/target/release/gwmp-mux"
alias virtual-lorawan-device="~/helium/virtual-lorawan-device/target/release/virtual-lorawan-device"
Define base URLs as shell variables:
# development versus production, respectively:
API_URL=https://api.helium.wtf
API_URL=https://api.helium.io
# Console corresponding to router-dev:
CONSOLE_URL="https://helium-console-dev.herokuapp.com"
# router-dev:
ROUTER_URL="http://54.176.88.149:8080"
# Optionally defined via script below:
ROUTER_OUI=2
Convenience variables:
(After a few preliminary runs, significantly increase the following values)
# number of organizations:
N_ORGS=2
# number of gateways:
N_GATEWAYS=2
# number of devices per org:
N_DEVICES=4
# For priming each loop so subsequent runs can add
# more orgs/devices/gws without clobbering:
FIRST_ORG=1
FIRST_GW=1
FIRST_DEV=1
(If going above double digits for gateways or triple digits for devices, the
scripts will need to be revised; see Bash printf
and seq --format
expressions below.)
Additional variables will be created along the way.
Logs & Graphs
Benchmarking load is meaningless without the ability to measure, and in order to optimize server capacity requires comparing meaningful, consistent measurements.
Graphs described below help on both points.
Subtle nuances to comprehend first:
- Normal has multiple meanings:
- colloquially speaking, it's that which is healthy
- in statistics it's the set of values centered within a particular distribution such as a bell curve
- Nominal also has multiple meanings:
- Within acceptable limits, synonymous with healthy operation
- Say nominal instead of normal in context of Load & Capacity for clarity because inbound traffic fluctuates, constantly altering the baseline
The router repo includes the Grafana dashboard configuration used in production.
For router under significant load, the most critical graphs to watch:
- Ensure Last Block nominally remains within single digit minutes
- Spikes of 10-20 minutes are generally recoverable
- Sustaining 15 minutes behind should be cause for alarm to take action
- Active State Channels on a server routing traffic
nominally has at least 1 and often 5-20
- A "state channel" facilitates off-block transactions which get persisted to the Blockchain as a single transaction
- The term "state channel" is the generic term for a Layer-2 Ledger (like the "Lightning Network") and critical to router's use of the Helium Blockchain
- An "actor" is a co-routine in Erlang (or "Green Threads", or similar in purpose to "async/await" with differing mechanisms)
- Offer -> Packet -> Downlink travel times:
- Ideally would be within milliseconds (ms)
- Acceptable when below 2 seconds
- Alarm and take action when above 5 seconds
- Offer Duration should be within milliseconds (ms)
- This translates directly to latency of server response
- Excludes any Internet traffic latency
- Others have values and trends that vary based upon what is being measured
and are described below:
- Join Offer Rate
- Packet Offer Rate
- Offer Rejected Reasons
For designing your own graphs, it helps bucketing by percentiles. Common buckets are: 1%, 5%, 15%, 50%, 85%, 90%, 95%, 97%, 98%, 99% 100%, which give the average/mean, an approximation of standard deviation (15%, 85%) and most importantly the outliers. Outliers are skewed here towards the troublesome high end for fine-grained tracking. The 99th and 100th may be safely ignored when those values are indistinguishable from latency across the Internet, and such outliers are generally due to periodic garbage-collection of Erlang's underlying BEAM virtual machine.
Ensure those graphs exist before load & capacity tests begin.
Manual Setup
Generate a Console API Key, which requires logging onto Console user interface (UI). Everything else will be done by command-line interface (CLI).
Each organization will have its own API Key, which may be obtained via the Console UI under Account Settings.
Store your new Console API Key generated from the UI to the TOML file
below, and specify base URL of the console. The default of
https://console.helium.com
is probably not what you want for
purposes here. This URL must match the Console corresponding to your
Router instance.
If using multiple organizations, create individual configuration files
within each org's subdirectory; e.g., org-01
, org-02
, respectively.
for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
mkdir $WORKDIR/org-$o
done
Persist each org's API key and Console base URL in its respective TOML
file:
(Note the dot preceding the file name)
cd $WORKDIR/org-01/
out=.helium-console-config.toml
# Be sure to use *your* assigned key:
echo 'key = "AlphaNumericString..."' > $out
echo 'base_url = "'${CONSOLE_URL}'"' >> $out
echo 'request_timeout = 120' >> $out
Repeat for each of the multiple organizations to be used.
Confirm getting valid JSON as a result:
for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
cd $WORKDIR/org-$o/
helium-console-cli device list
done
With valid JSON as the result, preliminary installation is complete.
Everything else may be scripted; however, creating your first device via Console UI may be easier than remembering how to create a unique EUI or app key.
Create & Fund Wallet
Use or create a new wallet with minimal funding as assurance to avoid overspending.
mkdir $WORKDIR/wallet
cd $WORKDIR/wallet/
helium-wallet create basic
WALLET=$(helium-wallet info | grep Address | awk '{print $4}')
echo $WALLET
Ensure enough Data Credits (DC) to cover cost of adding data-only hotspots and asserting location of each.
e.g., For 20 instances of gateway-rs: 20 x (1000000 + 500000) at HNT prices in 2022Q2, this will require 15 HNT. Plus add for uplinks from devices if on a non developer instance of router.
Provide the equivalent of the following command to whomever has the main wallet from which the limited wallet gets funded:
Please add funds for Load & Capacity testing:
helium-wallet burn --amount 25 --payee $WALLET --commit
Again, burning HNT is only for asserting hotspots and their locations.
When benchmarking within an isolated environment such as router-dev
, reset
DC balance on the router host which is described in
Fund Organizations below.
Optional: Sharing Keys With Router
Only applies when router's wallet uses ed25519 keys:
Since the Load & Capacity activities will likely be using a development
instance of a router, you may use same private key for the wallet:
/var/data/blockchain/swarm_key
Router can generate a file containing both public and private keys.
Evaluate the following on router host via router remote_console
command:
SwarmKey = filename:join([
application:get_env(blockchain, base_dir, "data"),
"blockchain",
"swarm_key"
]),
{ok, RouterKeys} = libp2p_crypto:load_keys(SwarmKey),
#{public := RouterPubKey, secret := RouterPrivKey} = RouterKeys.
%% View octets of private key as Erlang Binary:
io:format("~P", [libp2p_crypto:keys_to_bin(RouterKeys), 999]).
%% Using any other file path creates INSIDE docker container:
libp2p_crypto:save_keys(RouterKeys, "/var/data/wallet.key").
If output from above indicates that it's an ECC Compact key, that key file is incompatible with
helium-wallet
due to using a different elliptic curve.
The private key to be used by helium-wallet
command will be in
/var/data/wallet.key
file.
WALLET=$(helium-wallet -f /var/data/wallet.key info | \
grep Address | \
awk '{print $4}')
echo $WALLET
That shared wallet is ready for use below.
Router Address
Determine address of the router that you are using by running this one command on router host:
ssh router-dev
cd router/
router peer addr
Extract the sequence following /p2p/
from the output.
That long alphanumeric value gets used in multiple places below as the
$P2P
variable.
Define this on your laptop/workstation:
P2P=longAlphanumeric...
echo $P2P
Hexagons
Choose a hexagon for each gateway instance.
Because this is a test router yet each gateway's location must be asserted onto mainnet blockchain, pick a bogus location such as middle of a lake or river. (While there could be legitimate devices in a particular body of water, there is unlikely to be a hotspot there as well.)
The importance of asserting a location for each hotspot:
Without any location asserted for a particular gateway, the distance calculated to router defaults to 1000 as defined within
distance_between()
in router_device_devaddr.erl which may be preferable to triangulating based upon physical location.However, relying upon this default will bypass much program logic which involves a blockchain ledger lookup, thereby yielding artificially higher throughput.
A dedicated and isolated router and console pair may disregard concerns about Proof-of-Coverage (PoC) or an oversaturated Hex.
(Router uses
h3 hexagonal hierarchical geospatial
indexing system for computing distance, so dig there for any unique criteria
or constraints. For previewing the hex for each pair of coordinates,
use the live JavaScript feature of the underlying
API's documentation
with resolution of 8, and append the resulting value to:
https://explorer.helium.com/hotspots/hex/
.)
Select latitude and longitude:
LAT="43.5482302"
LON="-116.6575001"
echo $LAT $LON
This pair of latitude and longitude values will be used below.
Create Devices via Console API
Each device across all organizations must have unique Dev EUI, App EUI and App Key.
For reference, see LoRaWAN Link Layer Specification sections as of v1.0.4:
- 6.2.1 End-device identifier (DevEUI)
- 6.2.3 Application key (AppKey)
- and its footnote which implies the uniqueness constraint
By using an isolated instance of router, we may generate identifiers:
cd $WORKDIR
for org in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
for dev in $(seq --format="%03g" $FIRST_DEV $N_DEVICES); do
(cd org-$org && \
helium-console-cli device create \
a0000000000$org$dev \
600000000000000000000000000$org$dev \
d0000000000$org$dev \
benchmark-$dev || echo "org=$org device=$dev" >> errors.log)
done
done
Confirm:
for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
cd $WORKDIR/org-$o/
helium-console-cli device list > devices.json
done
Returns JSON, which when formatted via jq .
resembles the following.
{
"devices": [
{
"app_eui": "a000000000001001",
"app_key": "60000000000000000000000000001001",
"dev_eui": "d000000000001001",
"id": "7cbc7f9d-0205-4f85-ab0a-34de8c4d8f52",
"name": "benchmark-001",
"organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
"oui": 2
},
{
"app_eui": "a000000000001002",
"app_key": "60000000000000000000000000001002",
"dev_eui": "d000000000001002",
"id": "81d39917-b772-4937-b78b-618f4461340f",
"name": "benchmark-002",
"organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
"oui": 2
},
{
"app_eui": "a000000000001003",
"app_key": "60000000000000000000000000001003",
"dev_eui": "d000000000001003",
"id": "cb82bc53-49ec-4401-bed0-2c4010f26c1d",
"name": "benchmark-003",
"organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
"oui": 2
},
{
"app_eui": "a000000000001004",
"app_key": "60000000000000000000000000001004",
"dev_eui": "d000000000001004",
"id": "e7cca34d-5234-457f-ae92-8037c2649569",
"name": "benchmark-004",
"organization_id": "57d5b743-fc08-4119-b4ff-0c60ddf976ff",
"oui": 2
}
]
}
Confirm consistent number of devices generated across organizations:
wc -l org-*/devices.json
The value of "oui"
in output above should correspond to your value for
$ROUTER_OUI
.
ROUTER_OUI=$(jq '.devices[0].oui' org-01/devices.json)
echo $ROUTER_OUI
Virtual Devices
This requires devices already having been created via Console API.
The API provides the single source of truth about devices. (Device ID in Console might be mismatched with respect to identifiers generated below due to sequence of output from API, which is unimportant for benchmarking.)
Despite lack of plural in the name, virtual LoRaWAN device accommodates multiple devices, as indicated by its README.
Generate a single settings.toml
file for all devices per organization.
(This segments by org for managing benchmarking runs.)
Connections of devices from same org will distribute across multiple gateways, simulating real world scenarios with geographically dispersed deployments.
Sending more frequently than 2 seconds per device would be unrealistic.
The value for secs_between_transmits
may be tuned below, and
rx_delay
which defaults to 1s, may be increased via Console up to 15s.
cd $WORKDIR
# Use printf, not `seq --format` to avoid octal when assigning $i below
for o in $(seq $N_ORGS); do
org=$(printf "%02g" $o)
[ -d org-$org/settings ] || mkdir -p org-$org/settings
[ -e org-$org/settings/default.toml ] || \
ln -s ~/helium/virtual-lorawan-device/settings/default.toml \
org-$org/settings/
out="org-$org/settings/settings.toml"
echo 'default_server = "dev"' > $out
# Must be unique across all instances of virtual-lorawan-device:
metrics_port=$((9898 + o))
echo 'metrics_server = "127.0.0.1"' >> $out
echo 'metrics_port = '$metrics_port >> $out
echo '' >> $out
# Multiplexer sprays across all gateways:
echo "[packet_forwarder.mux]" >> $out
echo 'host = "localhost:1680"' >> $out
echo 'mac = "0807060504030200"' >> $out
echo '' >> $out
# Optional, if you want to bypass multiplexer:
# uncomment this block and use packet_forwarder=gw below.
#for gw in $(seq $N_GATEWAYS); do
# gateway=$(printf "%02g" $gw)
# echo "[packet_forwarder.gw${gateway}]" >> $out
# port=$((1680 + gw))
# echo 'host = "localhost:'${port}'"' >> $out
# echo 'mac = "08070605040302'${gateway}'"' >> $out
#done
json="org-$org/devices.json"
for d in $(seq $N_DEVICES); do
device=$(printf "%03g" $d)
echo '' >> $out
echo "[device.${device}]" >> $out
seconds=$((20 + d % 20))
echo 'secs_between_transmits = '$seconds >> $out
frames=$((120 - d % 40))
echo 'rejoin_frames = '$frames >> $out
echo 'packet_forwarder = "mux"' >> $out
# Optional round-robin allocation:
#i=$(( (d % N_GATEWAYS) + FIRST_GW ))
#gateway=$(printf "%02g" $i)
#echo '#packet_forwarder = "gw'${gateway}'"' >> $out
echo "[device.${device}.credentials]" >> $out
j=$((d - 1))
echo 'dev_eui = '$(jq .devices[$j].dev_eui < $json) >> $out
echo 'app_eui = '$(jq .devices[$j].app_eui < $json) >> $out
echo 'app_key = '$(jq .devices[$j].app_key < $json) >> $out
done
done
As seconds between transmits increases, number of frames until re-joins decreases for leveling overall traffic (reducing spikes and eliminating artificial surges).
Some packet_forwarder
lines are intentionally commented-out above,
because there might be occasions for connecting to a specific gateway
directly rather than using the multiplexer.
Sanity check:
grep -q 'dev_eui = null' org*/settings/settings.toml && \
echo Something went wrong.
Devices may require waiting upwards of 20 minutes (concurrently) for the router's XOR filter to run. The device will be marked in Console with "Pending" status until then, and there is an API call for this below.
Later when running the actual test, check logs of virtual-lorawan-device
for entries containing the following excerpt:
WARN ... UDP packet received after tx time ...
Significantly more than 1:100 ratio indicates the value for
default_secs_between_transmits
should be increased. That ratio
accommodates interference patterns from the scheduler in Linux and the Tokio
async/await framework used within the Rust app, or it can be from the
hypervisor when on a virtualized server instance.
The metrics API will be used below to track successful Joins.
Configure Gateways
This uses the $P2P
value defined above in "Router Address" section.
Each gateway must open a listening socket on a unique port number, so this simply increments from the default base port.
Install configuration files:
cd $WORKDIR
i=${FIRST_GW:-1}
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
mkdir gw-$gw
ln -s ~/helium/gateway-rs/config/default.toml gw-$gw/
out=gw-$gw/settings.toml
keyfile="$(pwd)/gw-$gw/gateway_key.bin"
echo 'keypair = "'${keyfile}'"' > $out
# Must be unique per gw and consistent with device configs:
port=$((1680 + i))
echo 'listen = "127.0.0.1:'${port}'"' >> $out
# Only needs to be unique:
port=$((4467 + i))
echo "api = $port" >> $out
echo 'region = "US915"' >> $out
echo '' >> $out
echo '[update]' >> $out
echo 'enabled = false' >> $out
echo '' >> $out
echo '[router.release]' >> $out
echo 'pubkey = "'${P2P}'"' >> $out
echo 'uri = "'${ROUTER_URL}'"' >> $out
echo 'oui = '${ROUTER_OUI} >> $out
echo '' >> $out
# Uncomment for high traffic runs:
echo '# [log]' >> $out
echo '# level = "warn"' >> $out
i=$((i + 1))
done
Note: If using "beta" or other version of gateway-rs instead of "release", change
[router.release]
to match.)
For benchmark runs with very high traffic, uncomment the [log]
section
above, because there's a measurable cost of I/O due to logging that can
negatively impact the host sending packets.
Add Gateways To Blockchain
Generate transactions using WALLET
variable defined earlier. Each
transaction gets committed later.
When in server-mode for the first time, gateway-rs needs time to generate key material and connect to upstream services, so start all concurrently:
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
helium_gateway -c gw-$gw server > gw-$gw/server.log &
done
Wait until all log files contain at least one entry for height
, which
implies that each has connected to its upstream validator:
# Wait:
grep -c height gw-$gw/server.log
Generate transactions, which will be signed and committed later:
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
helium_gateway -c gw-$gw info > gw-$gw/gw-info.json
helium_gateway -c gw-$gw \
add --mode dataonly \
--owner $WALLET \
--payer $WALLET > gw-$gw/add-dataonly.json
done
Extract each txn
value from the previous output and strip quotes from
value:
jq .txn gw-*/add-dataonly.json | \
sed 's/"//g' > txn-ids.log
Prepare command to add each new gateway based upon previous transactions:
(if wallet is on router host, run it there)
for txn in $(cat txn-ids.log); do
# Prompts for wallet password
helium-wallet \
-f $WORKDIR/wallet/wallet.key \
hotspots add $txn
done
Ensure there were no errors in the preceding output, and repeat same command
adding --commit
flag to make it reality.
This burns HNT from your wallet:
for txn in $(cat txn-ids.log); do
# Burn HNT from your wallet:
helium-wallet \
-f $WORKDIR/wallet/wallet.key \
hotspots add $txn \
--commit
done
Assert Location For Each Gateway
Assert location using $LAT
and $LON
defined earlier for
each gateway.
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
key=$(jq .key gw-$gw/gw-info.json | sed 's/"//g')
helium-wallet \
-f $WORKDIR/wallet/wallet.key \
hotspots assert --gateway $key \
--mode dataonly \
--lat=$LAT$gw --lon=$LON$gw --elevation 6
done
Repeat with --commit
flag, which burns HNT from your wallet:
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
key=$(jq .key gw-$gw/gw-info.json | sed 's/"//g')
helium-wallet \
-f $WORKDIR/wallet/wallet.key \
hotspots assert --gateway $key \
--mode dataonly \
--lat=$LAT$gw --lon=$LON$gw --elevation 6 \
--commit
done
Confirm Locations
Allow time-- approximately 5-20 minutes-- for those transactions to be recorded onto the blockchain, and then confirm asserted locations via blockchain API:
curl --user-agent "$UA" \
"$API_URL/v1/hotspots/location/distance?lat=$LAT&lon=$LON&distance=1000"
The distance
parameter above is in meters, so adjust its value if
necessary.
Ensure that your gateways appear within those results. If not found, wait several minutes and check again later.
Other gateways appearing within the results is benign because a virtual LoRaWAN device will only reach those for which it was explicitly configured since its radio is simulated.
With that, your gateways are ready for Load & Capacity runs.
Fund Organizations
Benchmarking on an isolated server instance of router (e.g.,
router-dev
) affords the option of resetting DC balance of an organization.
Otherwise, benchmarking will make a measurable impact to your wallet; see
Data Credits
within Console documentation.
First, find your organization_id
which may be obtained via Console UI.
From the Organizations page, there is an export widget associated with each
organization listed.
Alternatively, extract from devices.json
for each organization. Those
files were generated above.
Run:
jq .devices[0].organization_id org-*/devices.json
Use each of those organization IDs and provide the target DC balance, which gets specified as 100k in this example:
ssh router-dev
cd router/
router organization update bb5a2fec-1234-5678-89e4-a0e3c4807628 -b 100000
Confirm output of the above command, and then repeat with --commit
flag:
!! --commit
Again, the above command only works on non-production environments.
Wait for XOR Filter
The Console may indicate "Pending" from having added new devices above and may take upwards of 20 minutes.
Wait until that message clears before proceeding.
There is an API call for that status:
for o in $(seq --format="%02g" $FIRST_ORG $N_ORGS); do
cd $WORKDIR/org-$o/
helium-console-cli devices all > status.json
done
If any report as false
, wait one minute at minimum before checking
again to avoid being throttled or blocked:
cd $WORKDIR
jq '.[].in_xor_filter' org-*/status.json \
| grep false && sleep 60
Start Gateways
Launch each gateway "server" as a separate process, and let the OS handle concurrency:
cd $WORKDIR
for gw in $(seq --format="%02g" $N_GATEWAYS); do
helium_gateway -c gw-$gw server >> gw-$gw/run.log 2>&1 &
done
Allow several minutes for each gateway concurrently connecting to validators and the router specified in settings.toml file.
Wait until each hotspot reports height of the blockchain:
for gw in $(seq --format="%02g" $FIRST_GW $N_GATEWAYS); do
helium_gateway -c gw-$gw info > gw-$gw/gw-info.json
done
jq .gateway.height gw-*/gw-info.json || echo "Wait!"
Only after each gateway reports a large integer for height, continue.
Due to backoff and retry, each delay takes an increasing amount of time with random variation:
INFO selecting new gateway in 36s, module: dispatcher
...
INFO selecting new gateway in 69s, module: dispatcher
...
INFO selecting new gateway in 152s, module: dispatcher
...
INFO selecting new gateway in 1826s, module: dispatcher
Warnings about deadlines logged between retries can be safely ignored:
WARN gateway stream setup error: ...
"error trying to connect: deadline has elapsed" ...
Note:
A gateway may disconnect from its upstream validators or router and automatically reconnect. When that occurs, the warnings and retries described above may appear at that time.
One or two retries should be just a blip on the graphs, but extended retries might be cause for invalidating the affected run for purposes of benchmarking.
Start Multiplexers
This sends every Join request and packet to all gateways.
Device configurations performed earlier per organization already account for using the multiplexer.
Run in its own Terminal window:
(running in background to be accommodated in a future release)
port=1681
GW_LIST=''
for gw in $(seq 2 $N_GATEWAYS); do
port=$((port + 1))
GW_LIST="${GW_LIST} 127.0.0.1:$port"
done
gwmp-mux --host 1680 --client $GW_LIST > mux.log 2>&1
All gateways will receive all Join and Uplink requests.
Spawn Virtual Devices
Launch fleet of virtual devices as a single OS process:
cd $WORKDIR
#export RUST_LOG=warn
for o in $(seq --format="%02g" $N_ORGS); do
virtual-lorawan-device \
--settings org-$o/settings/ >> org-$o/run.log 2>&1 &
# Slight mitigation against hammer effect:
sleep 5
done
While waiting for the outer loop to finish, watch your graphs; e.g, Grafana.
Optionally set RUST_LOG=debug
environment variable for verbose logs.
Note: virtual-lorawan-device causes the device to Join (or re-join) upon each start of the utility. (For continuing a previous session, pull-requests there are welcomed!)
Watch Devices
The metrics_port
associated with each running instance of
virtual-lorawan-device
may be queried via HTTP GET. For bulk requests:
metrics() {
time=$(date +%Y-%m-%dt%H:%M:%S)
for o in $(seq --format="%02g" $N_ORGS); do
port=$(grep metrics_port $WORKDIR/org-$o/settings/settings.toml | \
awk '{print $3}')
curl http://localhost:$port/ > org-$o-stats-$time.txt
done
}
Periodically run:
metrics
Useful statistics to observe while everything is running may be determined by counting the following log entries for determining success ratios sampled over duration of each benchmarking run.
Logs of virtual-lorawan-device:
- "No Join Accept Received"
- "sending packet"
- "RxWindow expired, expected ACK to confirmed uplink not received"
Logs from gateway-rs for each packet indicate that the packet has been queued but not necessarily delivered to router.
Watch A Single Device
This is an optional debugging step.
While the fleet of virtual-lorawan-device
instances runs, watch a
single device.
First, run the loop from Wait for XOR Filter above
again to re-populate status.json
for each organization.
Find devices that indicate a potential problem.
Device IDs that never connected:
jq '.[]|select(.last_connected|.==null)|.id' org-*/status.json
Devices that connected longest ago:
jq '.|sort_by(.last_connected)|.[0]' org-*/status.json
Device IDs with fewest packets:
jq '.|sort_by(.total_packets)|.[0].id' org-*/status.json
Use those Device IDs for tracing via router CLI:
ssh router
cd router/
router device trace --id=8db30307-1234-5678-9abc-511376ac301b
Each trace expires after 240 minutes.
Stop
Generally allow this fleet of virtual devices to run for 15-20 minutes, which gives ample traffic that the Erlang BEAM virtual machine's garbage-collector will have run, etc.
When you have enough data:
killall virtual-lorawan-device
killall helium_gateway
(Those must be names of each actual command-- not Bash aliases.)
Confirm no longer running:
ps aux | grep virtual-lorawan-device
ps aux | grep helium_gateway
Iterate
To ensure sufficient load, you may need to run again with increased values for:
- number of organizations (
N_ORGS
) - number of devices (
N_DEVICES
) - number of gateways (
N_GATEWAYS
)
When doing so, also increase corresponding values for FIRST_*
to be the
former value of N_*
to avoid clobbering or creating duplicate devices.
Check for new releases of virtual-lorawan-device that may facilitate driving higher traffic. Future work will occur there.
Comprehending The Results
For measuring number of devices within a devaddr slab:
Ensure "Join Offer Rate" graph shows step-wise increases, and "Packet Offer
Rate" graph resembles a plateau. Then, watch "Offer Rejected Reasons" graph
and stop when seeing an increase in devaddr_no_device
.
With disambiguation due in part to the Message Integrity Check (MIC), the usable number will be higher than apparent capacity; thus, an 8 slab accommodates more than eight devices. Actual limits may vary with Router and/or Console configurations used.
For measuring traffic load:
There should be a slow ramp-up of traffic such that we "find the knee" of the graph before hitting the plateau (ceiling). Understand this plateau being an approximation due to the nature of generated traffic.
A good rule-of-thumb:
Plan for 70% CPU utilization for nominal load, which provides a cushion for traffic spikes without being excessively oversized.
Repeatable Results
When confirming any such measurements:
Run the same scenario a minimum of three consecutive times with consistent results.
That excludes a preliminary run to invalidate the OS file-system cache.
This rule of three requires consistency. If results vary by more than rounding error, keep running until getting 3 with similar results.
If the only consistency is that results vary widely, look to other artifacts on the host OS that might be causing interference. Consider migrating to different host server hardware (via hard stop/start) if running on a virtualized/hypervisor environment such as AWS, GCP, Linode, etc.
Items that we addressed:
- On Ubuntu Server, prevent Snap from updating during a benchmark run:
sudo snap list
andsudo apt-get purge snapd
- If graphs are inconsistent, confirm each gateway is still connected to its
upstream Validator:
- Confirm "Join Offer Rate" graph shows step-wise increases
- Confirm "Packet Offer Rate" graph resembles a plateau