I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd EBS).
I can reach a cluster wide write requests of 30k/second and read request about 100/second.
The cluster OS load constantly above 10.
Are those normal?
Assuming you meant 100k, that likely for something with 16mb of storage (probably way small)
where the data is more that 64k hence will not fit into the row cache.
Are you saying 7k writes per node? or 30k writes per node?
あなたの言っている意味はノードごとに7Kの書き込みですかそれとも30kの書き込みですか?
yes, it is about 8k writes per node.
はい、それは、ノードあたり約8kの書き込みです。
Lots of variables you're leaving out.
Depends on write size, if you're using logged batch or not, what consistency level, what RF, if the writes come in bursts, etc, etc.
However, that's all sort of moot for determining "normal" really you need a baseline as all those variables end up mattering a huge amount.
I would suggest using Cassandra stress as a baseline and go from there depending on what those numbers say (just pick the defaults).
Although the focus is on Spark and Cassandra and multi-DC there are also some single DC benchmarks of m4.xl clusters plus some discussion of how we went about benchmarking.
For the post, it seems they got a little better but similar result than i did. Good to know it.
I am not sure if a little fine tuning of heap memory will help or not.
The version of cassandra is 3.0.6 and
java version "1.8.0_91"
Cassandraは 3.0.6で
Javaは"1.8.0_91" です。
You machine instance is 4 vcpus that is 4 threads (not cores!!!),
aside from any Cassandra specific discussion a system load of 10 on a 4 threads machine is way too much in my opinion.
If that is the running average system load I would look deeper into system details.
Is that IO wait? Is that CPU Stolen?
Is that a Cassandra only instance or are there other processes pushing the load?
What does your "nodetool tpstats" say?
How many dropped messages do you have?
What's your CPU looking like? If it's low, check your IO with iostat or dstat.
I know some people have used Ebs and say it's fine but ive been burned too many times.
A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those on writes, you’re going to suffer on reads.
You have a 16G server, and probably a good chunk of that allocated to heap.
Consequently, you have almost no page cache, so your reads are going to hit the disk.
Your reads being very low is not uncommon if you have no page cache
– the default settings for Cassandra (64k compression chunks) are really inefficient for small reads served off of disk.
If you drop the compression chunk size (4k, for example), you’ll probably see your read throughput increase significantly,
which will give you more iops for commitlog, so write throughput likely goes up, too.
Those numbers, as I suspected, line up pretty well with your AWS configuration and network latencies within AWS.
It is clear that this is a WRITE ONLY test. You might want to do a mixed (e.g. 50% read, 50% write) test for sanity.
Note that the test will populate the data BEFORE it begins doing the read/write tests.
In a dedicated environment at a recent client, with 10gbit links (just grabbing one casstest run from my archives)
I see less than twice the above. Note your latency max is the result of a stop-the-world garbage collection.
There were huge problems below because this particular run was using 24gb (Cassandra 2.x) java heap.
op rate : 21567 [WRITE:21567]
partition rate : 21567 [WRITE:21567]
row rate : 21567 [WRITE:21567]
latency mean : 9.3 [WRITE:9.3]
latency median : 7.7 [WRITE:7.7]
latency 95th percentile : 13.2 [WRITE:13.2]
latency 99th percentile : 32.6 [WRITE:32.6]
latency 99.9th percentile : 97.2 [WRITE:97.2]
latency max : 14906.1 [WRITE:14906.1]
Total partitions : 83333333 [WRITE:83333333]
Total errors : 0 [WRITE:0]
total gc count : 705
total gc mb : 1691132
total gc time (s) : 30
avg gc time(ms) : 43
stdev gc time(ms) : 13
Total operation time : 01:04:23
When you have high system load it means your CPU is waiting for *something*, and in my experience it's usually slow disk.
A disk connected over network has been a culprit for me many times.
where the the first '128' are the active reuests and the second '128' are the pending ones.
Might not be strictly related, however this might be of interest:
$nodetool tpstats
...
Pool Name Active Pending Completed Blocked All time blocked
Native-Transport-Requests 128 128 1420623949 1 142821509
...
What is this? Is it normal?
$nodetool tpstats
...
Pool Name Active Pending Completed Blocked All time blocked
Native-Transport-Requests 128 128 1420623949 1 142821509
...
これは何ですか?
これは正常ですか?
In addition, it seems the compaction is very often.
It happens like every couple of seconds and one after one.
It seems causing high load.
Sometimes, the Pending can change from 128 to 129, 125 etc.
時には、保留中は、128から129や125などにに代わることがあります。
Same behavior here with a very different setup.
After an upgrade to 2.1.14 (from 2.0.17) I see a high load and many NTR "all time blocked".
Offheap memtable lowered the blocked NTR for me,
I put a comment on CASSANDRA-11363