Is Spam?

These last days I’ve reported to the admin a few posts as spam, so I’ve developed this small bash script to detect posible posts

INFO: You need to have httpie and jq installed. Also, an API-KEY is required

INFO: In fact, it’s more to practice httpie and jq filtering capabilities than a useful tool


latest=$(http api-key:$API_KEY accept:application/vnd.forem.api-v1+json  per_page==80)

filtered=$(jq '.[] | select(.reading_time_minutes==1 and .user.user_id > 4)' <<< "$latest")

echo Total Last articles $(jq -M -r '.id' <<< "$filtered" | wc -l)
echo '-----'

echo Number of authors $(jq -M -r '.user.user_id' <<< "$filtered" | uniq | wc -l)
echo '-----'

users=$(jq -M -r '.user | .user_id' <<< "$filtered" | uniq)

for user_id in $(echo "$users"); do

   strjoined_at=$(http GET "$user_id" api-key:$API_KEY accept:application/vnd.forem.api-v1+json | jq -r '.joined_at')

   joined_at=$(date --date="$strjoined_at" "+%Y-%m-%d")
   days=$((($(date +%s) - $(date -d "$joined_at" +%s))/86400))

   if (( ${days:-2} < 3 )); then
        echo "The $user_id user is suspect to be spam, see post:"
        jq --arg jq_user_id ${user_id} '.[] | select(.user.user_id == ($jq_user_id|tonumber)) | .url' <<< "$latest"
1 retrieve last articles (80 max)
2 filter by reading_time_minutes as spam usually are short post
3 extract uniques user_id
4 find user details for user_id
5 check if this account was recently created

Obviously not all articles that meet these conditions are spam. Lot of people (as me) write a hello-world just created the account so the script show the url, so I can read the post and decide if it’s spam or not.

For next version, I have time, I would like to include some kind of "IA" to automatically read the post and decide if the post is spam

Follow comments at Telegram group Or subscribe to the Channel Telegram channel

2019 - 2024 | Mixed with Bootstrap | Baked with JBake v2.6.7