Codeberg

6 months ago

Codeberg
6 months ago

We apologize for a period of extreme slowness today. The army of AI crawlers just leveled up and hit us very badly.

The good news: We're keeping up with the additional load of new users moving to Codeberg. Welcome aboard, we're happy to have you here. After adjusting the AI crawler protections, performance significantly improved again.

Lapo Luchini likes this.

reshared this

in reply to Codeberg

Codeberg

in reply to Codeberg 6 months ago

It seems like the AI crawlers learned how to solve the Anubis challenges. Anubis is a tool hosted on our infrastructure that requires browsers to do some heavy computation before accessing Codeberg again. It really saved us tons of nerves over the past months, because it saved us from manually maintaining blocklists to having a working detection for "real browsers" and "AI crawlers".

Lapo Luchini likes this.

reshared this

in reply to Codeberg

Codeberg

in reply to Codeberg 6 months ago

However, we can confirm that at least Huawei networks now send the challenge responses and they actually do seem to take a few seconds to actually compute the answers. It looks plausible, so we assume that AI crawlers leveled up their computing power to emulate more of real browser behaviour to bypass the diversity of challenges that platform enabled to avoid the bot army.

reshared this

in reply to Codeberg

Codeberg

in reply to Codeberg 6 months ago

We have a list of explicitly blocked IP ranges. However, a configuration oversight on our part only blocked these ranges on the "normal" routes. The "anubis-protected" routes didn't consider the challenge. It was not a problem while Anubis also protected from the crawlers on the other routes.

However, now that they managed to break through Anubis, there was nothing stopping these armies.

It took us a while to identify and fix the config issue, but we're safe again (for now).

reshared this

in reply to Codeberg

Michael Simons

in reply to Codeberg 6 months ago

Great thread and explanation. Thank you.

in reply to Codeberg

Codeberg

in reply to Codeberg 6 months ago

The media in this post is not displayed to visitors. To view it, please go to the original post.

For the load average auction, we offer these numbers from one of our physical servers. Who can offer more?

(It was not the "wildest" moment, but the only for which we have a screenshot)

reshared this

in reply to Codeberg

Mx Autumn

in reply to Codeberg 6 months ago

wow

in reply to Codeberg

montrak

in reply to Codeberg 6 months ago

The media in this post is not displayed to visitors. To view it, please go to the original post.

@Codeberg

in reply to Codeberg

DamonHD

in reply to Codeberg 6 months ago

In the days of single CPU servers (early 90s?) and an interesting filesystem problem, I think I may have seen ~400 at a client site!

in reply to Codeberg

Kevin

in reply to Codeberg 6 months ago

ouch. This remains a cat-and-mouse game.

At least having them solve the Anubis challenge does cost them extra resources, but if they can do that at scale, it doesn't promise a lot of good.

in reply to Codeberg

Askaaron

in reply to Codeberg 6 months ago

wow - that looks scary. Thanks for all your work ❤️

in reply to Codeberg

Xe

in reply to Codeberg 6 months ago

I really wish you contacted me at all about this before going public.

in reply to Xe

Codeberg

in reply to Xe 6 months ago

@cadey I'm sorry if this gave you any unwanted or negative attention. I consider crawlers emulating more of real browser features to bypass protections of websites an inevitable future, and today at least one big crawler seems to have started doing so. ~f

@Xe

in reply to Codeberg

Xe

in reply to Codeberg 6 months ago

Can we continue this conversation over email after my panic subsides? me@xeiaso.net.

in reply to Codeberg

sam

in reply to Codeberg 6 months ago

It'd be a good time to encourage folks to sign up to github.com/sponsors/Xe.

Sponsor @Xe on GitHub Sponsors

Support Xe's work in open source

^GitHub

in reply to sam

Codeberg

in reply to sam 6 months ago

@thesamesam Unfortunately, I'm not sure if encouraging anyone to reinforce the vendor-lock-in of Microsoft GitHub by making maintainers financially dependent on that platform, is in spirit with our mission. ~f

@sam

in reply to Codeberg

Bredroll

in reply to Codeberg 6 months ago

yeowsa. this feels like an arms race that is going to get harder :(

in reply to Codeberg

Hakan Bayındır

in reply to Codeberg 6 months ago

This is a great number, but I have seen higher in my career. Unfortunately I either have no screenshots or lost what I already have.

5831.24 is pretty good though. Congrats for hitting, hope your head doesn't hurt. :D

in reply to Codeberg

lindesbs #FckAFD

in reply to Codeberg 6 months ago

Hw much RAM do you have in your Machines?

in reply to lindesbs #FckAFD

Codeberg

in reply to lindesbs #FckAFD 6 months ago

@lindesbs 160 GB apparently. Looked it up from codeberg.org/Codeberg-Infrastr…. ~f

meta/hardware/achtermann.md at main

meta - Organizational repo for Codeberg's Infrastructure: Documentation, Organizing, Planning.

^Codeberg.org

@lindesbs #FckAFD

in reply to Codeberg

Clara

in reply to Codeberg 6 months ago

damn. The only time I've seen numbers like this were when a ceph server went down.

in reply to Codeberg

Sharlatan

in reply to Codeberg 6 months ago

what is the threshold for alerting so? Grafana/Zabbix/Prometheus?

in reply to Codeberg

Jann Horn

in reply to Codeberg 6 months ago

huh, that's a pretty kernel-heavy workload, so much red

This entry was edited (6 months ago)

in reply to Codeberg

Stephen Foskett

in reply to Codeberg 6 months ago

omfg that load!

in reply to Codeberg

arialdo

in reply to Codeberg 6 months ago

thank you for the details. Very interesting. They are worth a blog post.

in reply to Codeberg

SKC 🏳️‍🌈

in reply to Codeberg 6 months ago

what if you had challenges for AI to perform that made it mine bitcoin for you and you just block them at the end anyway 🤣

in reply to Codeberg

odo2063

in reply to Codeberg 6 months ago

The media in this post is not displayed to visitors. To view it, please go to the original post.

Here goes more™...

in reply to Codeberg

Þór Sigurðsson

in reply to Codeberg 6 months ago

How much of that load was actual I/O wait?

in reply to Codeberg

Lenny

in reply to Codeberg 6 months ago

Why not just to block huawei cloud asn prefixes?
It's easy to get them (e.g. from projectdiscovery)

in reply to Lenny

Codeberg

in reply to Lenny 6 months ago

@lenny If you read the thread, you'll notice that this is exactly what we did, except that we made a mistake. ~f

@Lenny

in reply to Codeberg

EveyPN

in reply to Codeberg 5 months ago

We sometimes see similar numbers, or 10k+ when a user submits a 64core job in a single slot and the cgroup limiting kicks in. Bit annoying that load is a bit useless for that now a days

in reply to Codeberg

Ludovic :Firefox: :FreeBSD:

in reply to Codeberg 6 months ago

#opshugs

in reply to Codeberg

GNU/翠星石

in reply to Codeberg 6 months ago

>now that they managed to break through Anubis
There was no break - it's a simple matter of changing the useragent, or if for some reason there's still a challenge, simply utilizing the plentiful computing power that is available on their servers (which far outstrips the processing power mobile devices have).

Anubis is evil and is proprietary malware - please do not attack your users with proprietary malware.

If you want to stop scraper bots, start serving GNUzip bombs - you can't scrape when your server RAM is full.

dd if=/dev/zero bs=1G count=10 | gzip > /tmp/10GiB.gz
dd if=/dev/zero bs=1G count=100 | gzip > /tmp/100GiB.gz
dd if=/dev/zero bs=1G count=1025 | gzip > /tmp/1TiB.gz

nginx; #serve gzip bombs
location ~* /bombs-path/.*\.gz {
add_header Content-Encoding "gzip";
default_type "text/html";
}

#serve zstd bombs
location ~* /bombs-path/.*\.zst {
add_header Content-Encoding "zstd";
default_type "text/html";
}

Then it's a matter of bait links that the user won't see, but bots will.

#serve

SuperDicq reshared this.

in reply to GNU/翠星石

Codeberg

in reply to GNU/翠星石 6 months ago

@Suiseiseki Anubis is the option that saved us a lot of work over the past months. We are not happy about it being open core or using GitHub sponsors, but we acknowledge the position from the maintainer: codeberg.org/forgejo/discussio…

Calling our usage of anubis an attack on our users is far-fetched. But feel free to move elsewhere, or host an alternative without resorting to extreme measures. We're happy to see working proof that any other protection can be scaled up to the level of Codeberg. ~f

Anubis - using proof-of-work to stop AI crawlers

- https://xeiaso.net/notes/2025/amazon-crawler/ - https://anubis.techaro.lol/ - https://anubis.techaro.

^Codeberg.org

@GNU/翠星石

in reply to Codeberg

Codeberg

in reply to Codeberg 6 months ago

@Suiseiseki BTW, we're also actively following the work around iocaine, e.g. come-from.mad-scientist.club/@…

However, as far as we can see, it does not sufficiently protect from crawling. As the bot armies successfully spread over many servers and addresses, damaging one of them doesn't prevent the next one from doing harmful requests, unfortunately. ~f

@GNU/翠星石

in reply to Codeberg

SuperDicq

in reply to Codeberg 6 months ago

A lot of users can not pass Anubis challenges because Anubis does not support every browser and is also incompatible with popular security focussed browser extensions such as JShelter.

Asking your users to enable JavaScript and to disable security extensions like JShelter in order to visit your website is very bad, don't you agree?

I don't think it is far-fetched to call it an attack on your users at all.

in reply to SuperDicq

cmdr ░ nova ⸸ :~$ 🏳️‍⚧️

in reply to SuperDicq 6 months ago

I don’t think that because some people browse the internet with JavaScript off, should mean that you should just open your server up to being ddosed over and over by aggressive scrapers. Maybe there should be a happy medium, but this is now the world we live in, thanks to grifters like Sam Altman

in reply to cmdr ░ nova ⸸ :~$ 🏳️‍⚧️

SuperDicq

in reply to cmdr ░ nova ⸸ :~$ 🏳️‍⚧️ 6 months ago

Saying Anubis is the only solution the scraper problem is a false dilemma. There are many other methods of stopping scrapers.

This is extremely bad for accessibility and I consider it exclusionary for many people who want to contribute to free software, but now can't do.

in reply to Codeberg

Zergling_man - fedicon 2026 @ C109

in reply to Codeberg 6 months ago

>can be scaled up to the level of Codeberg
He says, on the federated network.

1) Put /botsfuckoff/ path redirect to script that randomly generates 200 links to itself whenever it's accessed
2) Deny in robots.txt
3) Put hidden link to it at the top of the home page

in reply to Codeberg

Pluto

in reply to Codeberg 6 months ago

I believe @Suiseiseki is not referring to codebergs usage of anubis specifically, rather shares fsfs' stance (which I don't share) that Anubis "acts like malware" for making "calculations that a user does not want done": fsf.org/blogs/sysadmin/our-sma…

fsf saying fsf things :)

@GNU/翠星石

in reply to Codeberg

Stefano Zacchiroli

in reply to Codeberg 6 months ago

so, to clarify, do you have evidence that the bots were solving Anubis challenges or not, i.e., it was due to the configuration issue? (I think it's inevitably going to happen if Anubis gets traction. I'm just curious if we're already there or not.) Thanks for your work and transparency on all this.

in reply to Stefano Zacchiroli

Codeberg

in reply to Stefano Zacchiroli 6 months ago

@zacchiro Yes, the crawlers completed the challenges. We tried to verify if they are sharing the same cookie value across machines, but that doesn't seem to be the case.

@Stefano Zacchiroli

Stefano Zacchiroli reshared this.

in reply to Codeberg

Bradley M. Kühn

in reply to Codeberg 6 months ago

I have a follow up question, though, @Codeberg, re: @zacchiro's question. Is it *possible* that giant human farms of Anubis challenge-solvers actually did it? Or did it all happen so fast that there is no way it could be that?

#Huawei surely could fund such a farm and the routing software needed to get the challenge to the human and back to the bot quickly enough that it might *seem* the bot did it.

#huawei @Stefano Zacchiroli @Codeberg

in reply to Bradley M. Kühn

Codeberg

in reply to Bradley M. Kühn 6 months ago

@bkuhn
Anubis challenges are not solved by humans. It's not like a captcha. It's a challenge that the browser computes, based on the assumption that crawlers don't run real browsers for performance reasons and only implement simpler crawlers.

So at least one crawler now seems to emulate enough browser behaviour to make it pass the anubis challenge. ~f
@zacchiro

@Stefano Zacchiroli @Bradley M. Kühn

in reply to Codeberg

Bradley M. Kühn

in reply to Codeberg 6 months ago

I get it now.

Thanks for taking the time to clue me in.

I'm lucky that I haven't needed to learn about this until now and I'm so sorry you've had to do all this work to fight this LLM training DDoS!

Cc: @zacchiro

@Stefano Zacchiroli

in reply to Codeberg

Efraim Flashner

in reply to Codeberg 6 months ago

I like the idea of them figuring out solving the Anubis challenge only to be blocked afterward

in reply to Codeberg

Henrý Ólson

in reply to Codeberg 6 months ago

are the ip blocklists public?

in reply to Henrý Ólson

Codeberg

in reply to Henrý Ólson 6 months ago

@nemo Currently not. We wanted to investigate the legal situation with regards to sharing such lists. They could currently contain individual's IP addresses and likely need to be cleaned up first. ~f

@Henrý Ólson

in reply to Codeberg

Henrý Ólson

in reply to Codeberg 6 months ago

no worries, ty for fighting the good fight o7

in reply to Codeberg

Steven Sandoval

in reply to Codeberg 6 months ago

Was the solution to increase the proof-of-work difficulty?

in reply to Steven Sandoval

Codeberg

in reply to Steven Sandoval 6 months ago

@baltakatei No. We fixed our config. Now we're blocking the offending IP ranges directly. ~f

@Steven Sandoval

in reply to Codeberg

altf4

in reply to Codeberg 6 months ago

Damn it. I hate AI !

in reply to Codeberg

Chamomile 🐑

in reply to Codeberg 6 months ago

How much were they slowed down by actually solving the challenges? I was under the impression that the proof of work was the primary intent of Anubis, and the fact that most crawlers just bombed out and didn't even attempt them in the first place was a bonus.

in reply to Codeberg

Julien Avérous – 🇫🇷🇪🇺🇺🇦

in reply to Codeberg 6 months ago

It makes me wonder: there is a public curated IP blocklist somewhere that we can all use ? I searched a bit, I found only weak robot.txt solutions based on User Agent.

in reply to Codeberg

mikeTesteLinuxQlub

in reply to Codeberg 6 months ago

Seem a bad mouse and cat game, glad that you could stay at the top of it (proves that human can still win). Jesus christ, those big tech compagnies should be held responsable for that shit and pay billions in fine. Maybe then they would think of stopping that insanity.

in reply to Codeberg

Andreas Fink

in reply to Codeberg 6 months ago

instead of blocking, poisen the content for them...

in reply to Codeberg

NerdNextDoor

in reply to Codeberg 6 months ago

Good luck with fighting the bots. I recently moved my OSDev project and site to Codeberg from GitHub and so far it’s been great!

Thank you for helping the open-source community!

in reply to Codeberg

Woozle Hypertwin

in reply to Codeberg 6 months ago

Now what needs to happen is that part of the challenge computes a known answer while the other part does useful computational work, and there's no way for the 'bot to tell which is which -- so it has to do both.

That could maybe contribute computing power to something important like Folding@Home, or even just something pretty like Electric Sheep.

in reply to Woozle Hypertwin

Codeberg

in reply to Woozle Hypertwin 6 months ago

@woozle This topic was discussed in the past. The problem is that cutting useful work in small chunks AND verifying it is very difficult. It might work for some cryptocurrencies, but that's nothing we're interested in.

A proof of concept is more than welcome, but I don't yet know if anyone found a suitable task for this.

@Woozle Hypertwin

in reply to Codeberg

Woozle Hypertwin

in reply to Codeberg 6 months ago

Yeah, agreed it's nontrivial.

in reply to Woozle Hypertwin

Woozle Hypertwin

in reply to Woozle Hypertwin 6 months ago

(on further thought) ...or is it?

Create a set of N problems.
Solve a sampling of them.
Require the bot to solve all of them.
If the bot's solutions to the solved set don't match, then it fails the whole test.

Might that work? I guess there could be problems with trustability of the "unknown" answers -- does that look like the main issue to be solved?

in reply to Woozle Hypertwin

Codeberg

in reply to Woozle Hypertwin 6 months ago

@woozle Remember that users want to get through the challenge page quickly. So the more samples you have, the simpler the individual problems need to be.

@Woozle Hypertwin

in reply to Codeberg

Woozle Hypertwin

in reply to Codeberg 6 months ago

OH RIGHT... I was kinda forgetting about the actual-user time-penalty.

in reply to Woozle Hypertwin

Jeff Grigg

in reply to Woozle Hypertwin 6 months ago

@woozle
I tend to think that if I had "plenty of free time to fight them," I'd dynamically identify which ones were bots, and then honor their requests, but also keep feeding them harder and harder problems to solve, making their costs "go through the roof" quite quickly. And maybe even give them misleading garbage data.

But the would be a lot of work, of course.

And it would be risky, as one might occasionally wrongly identify an actual valid real user.

@Woozle Hypertwin

in reply to Codeberg

Bredroll

in reply to Codeberg 6 months ago

@woozle that basically sounds like homomorphic encryption :(

@Woozle Hypertwin

in reply to Codeberg

Krzysztof Sakrejda

in reply to Codeberg 6 months ago

this is an absurd level of waste they're introducing

in reply to Codeberg

Dan Jones

in reply to Codeberg 6 months ago

Pardon my ignorance, but couldn't they just be using a headless browser, which would still do everything a regular browser does? Just recently, ChatGPT beat Cloudflare's CAPTCHA using a similar system. Is there really any way around this at all? @Codeberg@social.anoxinon.de

ChatGPT Agent Passes CAPTCHA Test, Exposes Flaws in Bot Detection Systems

A new shift in online security is unfolding as OpenAI’s autonomous ChatGPT Agent has cracked one of the internet’s most common anti-bot challenges - Cloudflare’

^{Simran Mishra (Analytics Insight)}

@Codeberg

in reply to Codeberg

Cassandrich

in reply to Codeberg 6 months ago

If some of the attack is coming from Huawei's cloud hosting, it might be worth sending a complaint to their abuse department. IME Chinese companies tend to be scared of breaking rules in international dealings like this.

in reply to Codeberg

Trolli Schmittlauch 🦥

in reply to Codeberg 6 months ago

@binaergewitter Toter der Woche: Anubis-Challenges als wirksames Mittel

@binaergewitter

in reply to Codeberg

Sharlatan

in reply to Codeberg 6 months ago

maybe it's time to improve Anubis bsky.app/profile/techaro.lol

in reply to Codeberg

Peter Cock

in reply to Codeberg 6 months ago

this is now on the #anubis team’s radar: github.com/TecharoHQ/anubis/is…

Sev negative one: huawei bound scrapers are bypassing Anubis

I found out about this on Hacker News: https://social.anoxinon.de/@Codeberg/115033790447125787 Contact Codeberg to find out what is going on Give them an emergency fix Contribute better detection l...

^{Xe (GitHub)}

#anubis

This entry was edited (6 months ago)

in reply to Codeberg

Orca 🌻 | 🎀 | 🪁 | 🏴🏳️‍⚧️

in reply to Codeberg 6 months ago

Is it possible to configure Anubis to went super hard for certain IP ranges? Not only AI crawers, HuaweiCloud also engaged in bulk copying code repos from Github under the name of GitCode, they even created fake accounts not owned by the original author. Could it be they started doing this to Codeberg, too?
bytefish.medium.com/gitcode-is…
[Chinese] cnblogs.com/gt-it/p/18271287GitCode, a code hosting platform, a joint venture of HuaweiCloud and CSDN (should be a blog service, basically a content farm now) [Chinese] qbitai.com/2023/09/85598.html

CSDN 大规模抓取 GitHub 上的项目到 GitCode，伪造开发者主页引公愤

事件起因 CSDN旗下的GitCode最近因为一种极其不道德的行为引起了开发者的广泛愤怒和抗议。CSDN在没有通知或征求开发者同意的情况下，悄悄地将大量GitHub上的开源项目搬运到了其自己的GitCode平台上，并为这些项目的开发者创建了开发者主页。这个行为不仅侵犯了开发者的知识产权和劳动成果，也

^{gt-it (博客园)}

This entry was edited (6 months ago)

in reply to Codeberg

依云🦊

in reply to Codeberg 6 months ago

I observed them too about a month ago. I then sent the whole AS to Google's recaptcha and it worked (at least people who can solve recaptcha can still access our site while these bots can't).

in reply to Codeberg

ozamidas

in reply to Codeberg 6 months ago

boy Huawei is so nasty

I wonder who are the biggest offenders on this matter...

in reply to Codeberg

Aleksandra Fedorova

in reply to Codeberg 6 months ago

"AI crawlers learned how to solve the Anubis challenges"

Why does EU discuss chat control and not AI crawlers control again?

in reply to Codeberg

p̷t̵r̴a̵c̷e̶

in reply to Codeberg 6 months ago

eBPF could be more effective and easy on the CPU, since it acts on a way lower network layer. Anubis kinda has it's limits and it's way too easy to circumvent (as you found out)

Maybe it's worth it to consider eBPF (if not already happened)

And thanks guys for your work. I'm a proud supporter and I'll continue to support your work. Companies shouldn't control the Open Source space

This entry was edited (6 months ago)

in reply to Codeberg

Sakura | 🅥 |

in reply to Codeberg 6 months ago

hey, maybe mCaptcha is something for you. They adjust their algo based on system power and use memory more (which is way less abundant in AI server farms)

in reply to Codeberg

ulveon.net (on derg.social)

in reply to Codeberg 6 months ago

Anubis is extremely easy to bypass, you just have to change the User-Agent to not contain Mozilla, please get proper bot protection.

ulveon.net/p/2025-08-09-vangua…
This post talks briefly about other alternatives. Try Berghain, Balooproxy, or go-away.

Vanguard (Part 1): Telegram raid protection

Over the past weeks, I’ve been immersed in developing Vanguard. Initially it was little more than a hack, which was developed quickly and without much consideration towards the cleanliness of the code.

^{Ulveon's Thoughts}

in reply to ulveon.net (on derg.social)

Codeberg

in reply to ulveon.net (on derg.social) 6 months ago

@ulveon This depends on the configuration, and it was not the problem we have been running into today. ~f

@ulveon.net (on derg.social)

in reply to Codeberg

frphank

in reply to Codeberg 6 months ago from Mostr Bridge

Are you having them mine some Bitcoin while they're at it.

in reply to Codeberg

varx/tech

in reply to Codeberg 6 months ago

Have you looked into serving these LLM crawlers alternative versions of the site, with poisoned data? (And rate-limiting, of course.) I know it would be additional work for you to implement this, but... it might be effective.

I'm thinking you could have a precomputed set of 1000 different poison repos that get served up randomly, each of which is a Markov-chain-scrambled version of the files in a real repo.

(I wrote codeberg.org/timmc/marko to do something similar to the contents of my blog posts—a Markov model on either characters or words.)

marko

^Codeberg.org

This entry was edited (6 months ago)

in reply to varx/tech

bkil

in reply to varx/tech 6 months ago

@varx/tech It takes less code than this bkil.gitlab.io/static-wonders.… But see also a better solution: bkil.gitlab.io/static-wonders.…

@varx/tech

varx/tech likes this.

in reply to Codeberg

Bradley M. Kühn

in reply to Codeberg 6 months ago

😲🤬 re: what's happened to @Codeberg today.
The AI ballyhoo *is* a real DDoS against one of the few code hosting sites that takes a stand against slurping #FOSS code into LLM training sets — in violation of #copyleft.

Deregulation/lack-of-regulation will bring more of this. ∃ plenty of blame to go around, but #Microsoft & #GitHub deserve the bulk of it; they trailblazed the idea that FOSS code-hosting sites are lucrative targets.

giveupgithub.org

#GiveUpGitHub #FreeSoftware #OpenSource

Give Up GitHub - Software Freedom Conservancy

The Software Freedom Conservancy provides a non-profit home and services to Free, Libre and Open Source Software (FLOSS) projects.

^{giveupgithub.org}

#OpenSource #foss #github #Microsoft #freesoftware #copyleft #GiveUpGitHub @Codeberg

This entry was edited (6 months ago)

Software Freedom Conservancy likes this.

Software Freedom Conservancy reshared this.

in reply to Bradley M. Kühn

serk

in reply to Bradley M. Kühn 6 months ago

@bkuhn if anyone need it, there is this gist showing how to pseudo-automate repository bulk deletion.
gist.github.com/mrkpatchaa/637…

and this tool
reporemover.xyz very handy

Bulk delete github repos

Bulk delete github repos. GitHub Gist: instantly share code, notes, and snippets.

^Gist

@Bradley M. Kühn

This entry was edited (6 months ago)

in reply to serk

Bradley M. Kühn

in reply to serk 6 months ago

IMO, @serk, the better move is not to delete the repository, but to do something like I've done here with my personal “small hacks” repository:

github.com/bkuhn/small-hacks

I'm going to try to make a short video of how to do this, step by step. The main thing is that rather than 404'ing, the repository now spreads the message that we should #GiveUpGitHub!

GitHub - bkuhn/small-hacks: Give Up GitHub

Give Up GitHub. Contribute to bkuhn/small-hacks development by creating an account on GitHub.

^GitHub

#GiveUpGitHub @serk

in reply to Bradley M. Kühn

Brett Sheffield (he/him)

in reply to Bradley M. Kühn 6 months ago

@bkuhn @serk When @librecast moved our repos I wrote a script to wipe the GitHub repo and replace it with the #GiveUpGitHub README:

codeberg.org/librecast/giveupg…

giveupgithub.sh

Quick bash script to replace repos on github with the SFC "Give Up GitHub" README.md

^Codeberg.org

#GiveUpGitHub @Librecast @serk @Bradley M. Kühn

der.hans reshared this.

in reply to Brett Sheffield (he/him)

Brett Sheffield (he/him)

in reply to Brett Sheffield (he/him) 6 months ago

@bkuhn @serk spectra.video/w/mhoTtoSoXtkjan…

😉

@serk @Bradley M. Kühn

in reply to Bradley M. Kühn

bkil

in reply to Bradley M. Kühn 6 months ago

GitHub is still the only full-featured forge that can be viewed without JS (using a little custom userjs). Hilariously, this is caused by the PoW enacted by all alternatives.

in reply to Codeberg

Rachael Ava 💁🏻‍♀️

in reply to Codeberg 6 months ago

Thank you for defending us the best you guys can! I also appreciate the transparency and honesty. All the more reason to stay for the foreseeable future.

in reply to Codeberg

Gabriel Sanches

in reply to Codeberg 6 months ago

Apologize?! You're doing God's work.

in reply to Codeberg

Solinvictus

in reply to Codeberg 6 months ago

The media in this post is not displayed to visitors. To view it, please go to the original post.

@Codeberg

Unknown parent

Codeberg

Unknown parent 6 months ago

@gturri Anubis sends a challenge. The browser needs to compute the answer with "heavy" work. The server then has "light" work and verifies the challenge.

As far as we can tell, the crawlers actually do the computation and send the correct response. ~f

@gturri

in reply to Codeberg

Alex

in reply to Codeberg 6 months ago

could just setup a few traps that crash the AI crawlers or something. This is going to get really annoying and hopefully these bastards don't interfer with some of my work in the long run with what they've been doing on the internet. Scraping is already largely frowned upon so these pos are just making it worse.

in reply to Codeberg

RyanParsley

in reply to Codeberg 6 months ago

what if the new captcha was get a bug fix PR merged? That'd keep them robits out.

in reply to Codeberg

Jes - Hedgehog Edition

in reply to Codeberg 6 months ago

It might be time to consider something like iocaine.

in reply to Codeberg

kajer :notverified:

in reply to Codeberg 6 months ago

gzip bomb when?

in reply to Codeberg

lemgandi

in reply to Codeberg 6 months ago

Thank You For Your Service. ( I moved to Codeberg, like, yesterday, and signed up a recurring donation )

in reply to Codeberg

Daniel Lakeland

in reply to Codeberg 6 months ago

Are you guys using traffic shaping and queue management at all? For example putting something like QFQ qdisc on your routers and then marking packets from spammy sources as low-priority and putting them into a low priority queue can be a huge boost in responsiveness for your real customers.
Spammy sources could be those that open new connections too often, transfer too many bytes, or have too many open active connections. All of those kinds of things can be accounted in nftables.

in reply to Daniel Lakeland

bkil

in reply to Daniel Lakeland 6 months ago

If you followed through their posts, they are coming through millions of nodes of residential proxy services, with a single IP usually not requesting more than 1 web page on the same week. You can't filter that over the network.

in reply to bkil

Daniel Lakeland

in reply to bkil 6 months ago

@bkil
Yuck. That does suck. But anything that makes a client stand out can be used to change that clients network priority. In this case perhaps to boost the priority of packets of clients who have initiated more than one connection in the last hour. For those in the questionable group, send them to underpowered servers that return web pages that say "click xyz to continue to your requested page" once you've identified a likely real client they get elevated packet priority and
@Codeberg

@bkil @Codeberg

in reply to Daniel Lakeland

Daniel Lakeland

in reply to Daniel Lakeland 6 months ago

@bkil
Shuttled to a different back end server. The idea being to make the experience good for clients that act normal and low availability for clients that only connect once a week or once a day or etc. Plus make the questionable client run OCR and LLM on your click page to figure out how to get past it. Easy for a human, expensive for a bot.

Its obviously whack a mole. But if latency is 500-1000ms for bullshit clients and 50ms for your real clients then this is what you want.
@Codeberg

@bkil @Codeberg

in reply to Daniel Lakeland

Daniel Lakeland

in reply to Daniel Lakeland 6 months ago

@bkil
It means keeping an nftables IP set of millions of real customers but running a Linux server with gigs of ram as an edge router would make this fine. Even a billion real customers would need 16 gigs of RAM for ipv6 which is doable.
@Codeberg

@bkil @Codeberg

in reply to Codeberg

Assimilateborg

in reply to Codeberg 6 months ago

I hope you are feeding the crawlerd gabled plausible code examples, so that they produce garbage and keep fellow software developers in their jobs! Or even generate the new job title of "AI garbage disposal officer".

in reply to Codeberg

A Fine Day to build a fence 🏳️‍⚧️🏳️‍🌈🇺🇦🇵🇸

in reply to Codeberg 6 months ago

Keep up the good work!

in reply to Codeberg

Blain Smith

in reply to Codeberg 6 months ago

We appreciate you!

in reply to Codeberg

bit101

in reply to Codeberg 6 months ago

I've been moving my stuff to Codeberg. Glad to see you have a presence on Mastodon! Thanks for being there.

in reply to Codeberg

mosher

in reply to Codeberg 6 months ago

Really need to sue them for a denial of service attack, get them banned from touching a computer for 20 year.

in reply to Codeberg

muppeth

in reply to Codeberg 6 months ago

Are you going to publish your work anywhere? I guess it could cause the bot spike again, but I guess more forgejo instances will be hit with this soon so would be good to establish some way to communicate this among other forgejo instances to prevent abuse.

in reply to Codeberg

Watchful Citizen

in reply to Codeberg 6 months ago

Great job!

⇧

Codeberg 6 months ago • •

Codeberg
6 months ago