Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

Sign Up Now!

Forum discussion

Note I'm leaving one of these up for each user to better learn the mod tools

feel free to say stupid stuff in those threads while it remains up
 
Note I'm leaving one of these up for each user to better learn the mod tools

feel free to say stupid stuff in those threads while it remains up
I missed whatever you are talking about - are we talking Thai betting spam sort of rubbish again?

I understand the benefit of a training exercise in handling them one by one but you can also simply delete the user and have all the threads/comments disappear can't you?
 
I missed whatever you are talking about - are we talking Thai betting spam sort of rubbish again?

I understand the benefit of a training exercise in handling them one by one but you can also simply delete the user and have all the threads/comments disappear can't you?
yeah but need help with ip bans, been just banning the user and deleting them but they keep coming back
 
I get how bots work etc but is this just random internet traffic or is there someone behind it who's sought and found our site to disrupt it because of football? Can happen to any topic I guess.
 
I get how bots work etc but is this just random internet traffic or is there someone behind it who's sought and found our site to disrupt it because of football? Can happen to any topic I guess.
not sure!
 
yeah but need help with ip bans, been just banning the user and deleting them but they keep coming back
a aussie car forum I frequent has gone through alot of process's to cull bot invasions......
Maybe some of this jargon can help you guys for its cleaned up that site very well.........
I'm just copy pasting some intel the admin posted, if it helps you Pascuali for I have no idea..........

Here is the CPU utilisation (user) for all cores since midnight:

12:20:01 95.33
12:40:01 95.03
1:00:02 95.68
1:20:02 94.19
1:40:01 94.33
2:00:01 96.03
2:20:01 90.25
2:40:01 94.11
3:00:02 91.88
3:20:03 22.90
3:40:01 81.86
4:00:01 73.29
4:20:01 87.73
4:40:01 80.32
5:00:01 81.92
5:20:01 96.75
5:40:01 77.94
6:00:01 70.08
6:20:01 77.57
6:40:02 95.66
7:00:01 90.83
7:06:08 RESTART
7:20:01 86.88
7:40:02 88.08
8:00:01 84.57
8:20:01 48.71
8:40:01 91.88
9:00:01 94.26

.. all of which are just too high and translate to server load ratings 10-20 times what they should be. The CPU utilisation is split between the web and database servers with the latter using the most which is what I'd expect when handling large numbers of requests.

As a short term measure, I've reduced the session time out which means those bots that are 'stopped' will time out quicker which has reduced the guest numbers from 35k to 18k. The down side of that approach is real users will get asked to log in more frequently.

I really don't want to go down the Cloud Flare route as it has couple of big gotcha's from my PoV.

Instead, I have disabled guest access - this doesn't help a great deal in the short term as the AI bots are currently blocked from seeing anything anyway and if you could see what they were doing then they would mostly be looking at an error message telling them they had no access - it probably doesn't even help in the long term because no one cares whether they are actually scraping any content anyway.

It's hard to know what percentage of the guests are legitimate and which are AI bots without extracting a whole day of log entries (1.6 Gb on a day like yesterday) and sorting them by IP address and then looking up the heavily used IP ranges to see where they come from as their are some well known hosts (including AWS) for the AI bots but it might be worth the exercise to see.

Somebody asked earlier about email notifications - emails (except a couple of types) will be deferred when CPU load is above a set threshold and if the server drops below that they then get sent in a big bunch, albeit restricted to batches of 150 at a time.

for his help with a script that parses the access log file and sorts the IP addresses into Class B ranges and summarises the access attempts for each. That identified that between 4-6 million access requests per day were not valid and from this information, I've been able to identify those IP address ranges that have been flooding us with requests and a lot of them were originally coming from Vietnam so I blocked all of those and then they started coming from South American countries and they are now blocked as well.

You never actually win for long as they just use another range of addresses but having this information is a big help. To show what the impact of the bot activity is, we would normally use about 25 Gb data and handle ~2M responses and a similar number of pages per day.

on the 7th the stats were: 46.3 Gb / 6.655M responses / 6.569M pages;on the 8th the stats were: 36.9 Gb / 5.288M responses / 5.2M pages;on the 9th the stats were: 41.7 Gb / 4.604M responses / 4.509M pages;on the 10th the stats were: 35.8 Gb / 4.53M responses / 4.468M pages;on the 11th the stats were: 53.8 Gb / 8.653M responses / 8.562M pages; and
yesterday it was down to 25.86 Gb / 2.82M responses / 2.76M.
 
a aussie car forum I frequent has gone through alot of process's to cull bot invasions......
Maybe some of this jargon can help you guys for its cleaned up that site very well.........
I'm just copy pasting some intel the admin posted, if it helps you Pascuali for I have no idea..........

Here is the CPU utilisation (user) for all cores since midnight:

12:20:01 95.33
12:40:01 95.03
1:00:02 95.68
1:20:02 94.19
1:40:01 94.33
2:00:01 96.03
2:20:01 90.25
2:40:01 94.11
3:00:02 91.88
3:20:03 22.90
3:40:01 81.86
4:00:01 73.29
4:20:01 87.73
4:40:01 80.32
5:00:01 81.92
5:20:01 96.75
5:40:01 77.94
6:00:01 70.08
6:20:01 77.57
6:40:02 95.66
7:00:01 90.83
7:06:08 RESTART
7:20:01 86.88
7:40:02 88.08
8:00:01 84.57
8:20:01 48.71
8:40:01 91.88
9:00:01 94.26

.. all of which are just too high and translate to server load ratings 10-20 times what they should be. The CPU utilisation is split between the web and database servers with the latter using the most which is what I'd expect when handling large numbers of requests.

As a short term measure, I've reduced the session time out which means those bots that are 'stopped' will time out quicker which has reduced the guest numbers from 35k to 18k. The down side of that approach is real users will get asked to log in more frequently.

I really don't want to go down the Cloud Flare route as it has couple of big gotcha's from my PoV.

Instead, I have disabled guest access - this doesn't help a great deal in the short term as the AI bots are currently blocked from seeing anything anyway and if you could see what they were doing then they would mostly be looking at an error message telling them they had no access - it probably doesn't even help in the long term because no one cares whether they are actually scraping any content anyway.

It's hard to know what percentage of the guests are legitimate and which are AI bots without extracting a whole day of log entries (1.6 Gb on a day like yesterday) and sorting them by IP address and then looking up the heavily used IP ranges to see where they come from as their are some well known hosts (including AWS) for the AI bots but it might be worth the exercise to see.

Somebody asked earlier about email notifications - emails (except a couple of types) will be deferred when CPU load is above a set threshold and if the server drops below that they then get sent in a big bunch, albeit restricted to batches of 150 at a time.

for his help with a script that parses the access log file and sorts the IP addresses into Class B ranges and summarises the access attempts for each. That identified that between 4-6 million access requests per day were not valid and from this information, I've been able to identify those IP address ranges that have been flooding us with requests and a lot of them were originally coming from Vietnam so I blocked all of those and then they started coming from South American countries and they are now blocked as well.

You never actually win for long as they just use another range of addresses but having this information is a big help. To show what the impact of the bot activity is, we would normally use about 25 Gb data and handle ~2M responses and a similar number of pages per day.

on the 7th the stats were: 46.3 Gb / 6.655M responses / 6.569M pages;on the 8th the stats were: 36.9 Gb / 5.288M responses / 5.2M pages;on the 9th the stats were: 41.7 Gb / 4.604M responses / 4.509M pages;on the 10th the stats were: 35.8 Gb / 4.53M responses / 4.468M pages;on the 11th the stats were: 53.8 Gb / 8.653M responses / 8.562M pages; and
yesterday it was down to 25.86 Gb / 2.82M responses / 2.76M.
Excuse me whilst I get my hieroglyphics dictionary.
 
a aussie car forum I frequent has gone through alot of process's to cull bot invasions......
Maybe some of this jargon can help you guys for its cleaned up that site very well.........
I'm just copy pasting some intel the admin posted, if it helps you Pascuali for I have no idea..........

Here is the CPU utilisation (user) for all cores since midnight:

12:20:01 95.33
12:40:01 95.03
1:00:02 95.68
1:20:02 94.19
1:40:01 94.33
2:00:01 96.03
2:20:01 90.25
2:40:01 94.11
3:00:02 91.88
3:20:03 22.90
3:40:01 81.86
4:00:01 73.29
4:20:01 87.73
4:40:01 80.32
5:00:01 81.92
5:20:01 96.75
5:40:01 77.94
6:00:01 70.08
6:20:01 77.57
6:40:02 95.66
7:00:01 90.83
7:06:08 RESTART
7:20:01 86.88
7:40:02 88.08
8:00:01 84.57
8:20:01 48.71
8:40:01 91.88
9:00:01 94.26

.. all of which are just too high and translate to server load ratings 10-20 times what they should be. The CPU utilisation is split between the web and database servers with the latter using the most which is what I'd expect when handling large numbers of requests.

As a short term measure, I've reduced the session time out which means those bots that are 'stopped' will time out quicker which has reduced the guest numbers from 35k to 18k. The down side of that approach is real users will get asked to log in more frequently.

I really don't want to go down the Cloud Flare route as it has couple of big gotcha's from my PoV.

Instead, I have disabled guest access - this doesn't help a great deal in the short term as the AI bots are currently blocked from seeing anything anyway and if you could see what they were doing then they would mostly be looking at an error message telling them they had no access - it probably doesn't even help in the long term because no one cares whether they are actually scraping any content anyway.

It's hard to know what percentage of the guests are legitimate and which are AI bots without extracting a whole day of log entries (1.6 Gb on a day like yesterday) and sorting them by IP address and then looking up the heavily used IP ranges to see where they come from as their are some well known hosts (including AWS) for the AI bots but it might be worth the exercise to see.

Somebody asked earlier about email notifications - emails (except a couple of types) will be deferred when CPU load is above a set threshold and if the server drops below that they then get sent in a big bunch, albeit restricted to batches of 150 at a time.

for his help with a script that parses the access log file and sorts the IP addresses into Class B ranges and summarises the access attempts for each. That identified that between 4-6 million access requests per day were not valid and from this information, I've been able to identify those IP address ranges that have been flooding us with requests and a lot of them were originally coming from Vietnam so I blocked all of those and then they started coming from South American countries and they are now blocked as well.

You never actually win for long as they just use another range of addresses but having this information is a big help. To show what the impact of the bot activity is, we would normally use about 25 Gb data and handle ~2M responses and a similar number of pages per day.

on the 7th the stats were: 46.3 Gb / 6.655M responses / 6.569M pages;on the 8th the stats were: 36.9 Gb / 5.288M responses / 5.2M pages;on the 9th the stats were: 41.7 Gb / 4.604M responses / 4.509M pages;on the 10th the stats were: 35.8 Gb / 4.53M responses / 4.468M pages;on the 11th the stats were: 53.8 Gb / 8.653M responses / 8.562M pages; and
yesterday it was down to 25.86 Gb / 2.82M responses / 2.76M.
People really underestimate how pervasive bots are. They are always trying. Luckily I have a few tools at my disposal otherwise they would run the show here
 
These Q and A threads to follow up Podcasts are a really good idea!

The downside to G and G is the diversity of thread topics, podcasts, Fan View articles, etc, contribute to a lot of late night participation with G and G! I end up staying up too late!

I’ve also enjoyed more recent Fan View articles.Thanks a bunch to all who’ve written them.
 
Last edited:
People really underestimate how pervasive bots are. They are always trying. Luckily I have a few tools at my disposal otherwise they would run the show here
Hopefully, all your staff are equipped with the knowledge to delete the Thai spam bots.

Thanks to you and all your staff who delete them. It must be a tiresome job.
 
I've noticed a bit of biff in some threads lately. Is it the pre-christmas jitters?
 
Back
Top