We apologize for a period of extreme slowness today. The army of AI crawlers just leveled up and hit us very badly.
The good news: We're keeping up with the additional load of new users moving to Codeberg. Welcome aboard, we're happy to have you here. After adjusting the AI crawler protections, performance significantly improved again.
Lapo Luchini likes this.
reshared this
Codeberg
in reply to Codeberg • • •Lapo Luchini likes this.
reshared this
Glyph, Florian Schmidt, CatSalad🐈🥗 (D.Burch), alcinnz, Alex@rtnVFRmedia Suffolk UK, Lapo Luchini and JP Mens reshared this.
Codeberg
in reply to Codeberg • • •reshared this
Florian Schmidt and Alex@rtnVFRmedia Suffolk UK reshared this.
Codeberg
in reply to Codeberg • • •We have a list of explicitly blocked IP ranges. However, a configuration oversight on our part only blocked these ranges on the "normal" routes. The "anubis-protected" routes didn't consider the challenge. It was not a problem while Anubis also protected from the crawlers on the other routes.
However, now that they managed to break through Anubis, there was nothing stopping these armies.
It took us a while to identify and fix the config issue, but we're safe again (for now).
reshared this
Florian Schmidt, Alex@rtnVFRmedia Suffolk UK and JP Mens reshared this.
eternalyperplxed
in reply to Codeberg • • •Blaise Pabón
in reply to Codeberg • • •Michael Simons
in reply to Codeberg • • •Codeberg
in reply to Codeberg • • •For the load average auction, we offer these numbers from one of our physical servers. Who can offer more?
(It was not the "wildest" moment, but the only for which we have a screenshot)
reshared this
Florian Schmidt, Alex@rtnVFRmedia Suffolk UK, JP Mens and scy reshared this.
Mx Autumn
in reply to Codeberg • • •montrak
in reply to Codeberg • • •DamonHD
in reply to Codeberg • • •Kevin
in reply to Codeberg • • •ouch. This remains a cat-and-mouse game.
At least having them solve the Anubis challenge does cost them extra resources, but if they can do that at scale, it doesn't promise a lot of good.
Askaaron
in reply to Codeberg • • •Mason Loring Bliss
in reply to Codeberg • • •Xe
in reply to Codeberg • • •Codeberg
in reply to Xe • • •Xe
in reply to Codeberg • • •sam
in reply to Codeberg • • •Sponsor @Xe on GitHub Sponsors
GitHubCodeberg
in reply to sam • • •Bredroll
in reply to Codeberg • • •Hakan Bayındır
in reply to Codeberg • • •This is a great number, but I have seen higher in my career. Unfortunately I either have no screenshots or lost what I already have.
5831.24 is pretty good though. Congrats for hitting, hope your head doesn't hurt. :D
lindesbs #FckAFD
in reply to Codeberg • • •Codeberg
in reply to lindesbs #FckAFD • • •meta/hardware/achtermann.md at main
Codeberg.orgAurora
in reply to Codeberg • • •Sharlatan
in reply to Codeberg • • •Jann Horn
in reply to Codeberg • • •Stephen Foskett
in reply to Codeberg • • •arialdo
in reply to Codeberg • • •SKC 🏳️🌈
in reply to Codeberg • • •odo2063
in reply to Codeberg • • •Þór Sigurðsson
in reply to Codeberg • • •Lenny
in reply to Codeberg • • •It's easy to get them (e.g. from projectdiscovery)
Codeberg
in reply to Lenny • • •Ludovic :Firefox: :FreeBSD:
in reply to Codeberg • • •GNU/翠星石
in reply to Codeberg • • •>now that they managed to break through Anubis
There was no break - it's a simple matter of changing the useragent, or if for some reason there's still a challenge, simply utilizing the plentiful computing power that is available on their servers (which far outstrips the processing power mobile devices have).
Anubis is evil and is proprietary malware - please do not attack your users with proprietary malware.
If you want to stop scraper bots, start serving GNUzip bombs - you can't scrape when your server RAM is full.
dd if=/dev/zero bs=1G count=10 | gzip > /tmp/10GiB.gz
dd if=/dev/zero bs=1G count=100 | gzip > /tmp/100GiB.gz
dd if=/dev/zero bs=1G count=1025 | gzip > /tmp/1TiB.gz
nginx; #serve gzip bombs
location ~* /bombs-path/.*\.gz {
add_header Content-Encoding "gzip";
default_type "text/html";
}
#serve zstd bombs
location ~* /bombs-path/.*\.zst {
add_header Content-Encoding "zstd";
default_type "text/html";
}
Then it's a matter of bait links that the user won't see, but bots will.
SuperDicq reshared this.
Codeberg
in reply to GNU/翠星石 • • •@Suiseiseki Anubis is the option that saved us a lot of work over the past months. We are not happy about it being open core or using GitHub sponsors, but we acknowledge the position from the maintainer: codeberg.org/forgejo/discussio…
Calling our usage of anubis an attack on our users is far-fetched. But feel free to move elsewhere, or host an alternative without resorting to extreme measures. We're happy to see working proof that any other protection can be scaled up to the level of Codeberg. ~f
Anubis - using proof-of-work to stop AI crawlers
Codeberg.orgCodeberg
in reply to Codeberg • • •@Suiseiseki BTW, we're also actively following the work around iocaine, e.g. come-from.mad-scientist.club/@…
However, as far as we can see, it does not sufficiently protect from crawling. As the bot armies successfully spread over many servers and addresses, damaging one of them doesn't prevent the next one from doing harmful requests, unfortunately. ~f
SuperDicq
in reply to Codeberg • • •A lot of users can not pass Anubis challenges because Anubis does not support every browser and is also incompatible with popular security focussed browser extensions such as JShelter.
Asking your users to enable JavaScript and to disable security extensions like JShelter in order to visit your website is very bad, don't you agree?
I don't think it is far-fetched to call it an attack on your users at all.
cmdr ░ nova ⸸ :~$ 🏳️⚧️
in reply to SuperDicq • • •SuperDicq
in reply to cmdr ░ nova ⸸ :~$ 🏳️⚧️ • • •Saying Anubis is the only solution the scraper problem is a false dilemma. There are many other methods of stopping scrapers.
This is extremely bad for accessibility and I consider it exclusionary for many people who want to contribute to free software, but now can't do.
Zergling_man
in reply to Codeberg • • •>can be scaled up to the level of Codeberg
He says, on the federated network.
1) Put /botsfuckoff/ path redirect to script that randomly generates 200 links to itself whenever it's accessed
2) Deny in robots.txt
3) Put hidden link to it at the top of the home page
Pluto
in reply to Codeberg • • •I believe @Suiseiseki is not referring to codebergs usage of anubis specifically, rather shares fsfs' stance (which I don't share) that Anubis "acts like malware" for making "calculations that a user does not want done": fsf.org/blogs/sysadmin/our-sma…
fsf saying fsf things :)
glenngillen
in reply to Codeberg • • •@Suiseiseki@freesoftwareextremist.com “We are not happy about it being open core … GH sponsors”
Do you have better suggestions for how we can have a sustainable OSS model that isn’t entirely dependent on core contributors of major projects having full time jobs and then supporting everyone else in whatever free time they might have?
Stefano Zacchiroli
in reply to Codeberg • • •Codeberg
in reply to Stefano Zacchiroli • • •Stefano Zacchiroli reshared this.
Bradley Kuhn
in reply to Codeberg • • •I have a follow up question, though, @Codeberg, re: @zacchiro's question. Is it *possible* that giant human farms of Anubis challenge-solvers actually did it? Or did it all happen so fast that there is no way it could be that?
#Huawei surely could fund such a farm and the routing software needed to get the challenge to the human and back to the bot quickly enough that it might *seem* the bot did it.
Codeberg
in reply to Bradley Kuhn • • •@bkuhn
Anubis challenges are not solved by humans. It's not like a captcha. It's a challenge that the browser computes, based on the assumption that crawlers don't run real browsers for performance reasons and only implement simpler crawlers.
So at least one crawler now seems to emulate enough browser behaviour to make it pass the anubis challenge. ~f
@zacchiro
Bradley Kuhn
in reply to Codeberg • • •I get it now.
Thanks for taking the time to clue me in.
I'm lucky that I haven't needed to learn about this until now and I'm so sorry you've had to do all this work to fight this LLM training DDoS!
Cc: @zacchiro
Ondřej Surý
in reply to Codeberg • • •Efraim Flashner
in reply to Codeberg • • •Henrý Ólson
in reply to Codeberg • • •Codeberg
in reply to Henrý Ólson • • •Henrý Ólson
in reply to Codeberg • • •Steven Sandoval
in reply to Codeberg • • •Codeberg
in reply to Steven Sandoval • • •altf4
in reply to Codeberg • • •ec4x
in reply to Codeberg • • •Chamomile 🐑
in reply to Codeberg • • •Julien Avérous – 🇫🇷🇪🇺🇺🇦
in reply to Codeberg • • •mikeTesteLinuxQlub
in reply to Codeberg • • •Andreas Fink
in reply to Codeberg • • •NerdNextDoor
in reply to Codeberg • • •Good luck with fighting the bots. I recently moved my OSDev project and site to Codeberg from GitHub and so far it’s been great!
Thank you for helping the open-source community!
Woozle Hypertwin
in reply to Codeberg • • •Now what needs to happen is that part of the challenge computes a known answer while the other part does useful computational work, and there's no way for the 'bot to tell which is which -- so it has to do both.
That could maybe contribute computing power to something important like Folding@Home, or even just something pretty like Electric Sheep.
Codeberg
in reply to Woozle Hypertwin • • •@woozle This topic was discussed in the past. The problem is that cutting useful work in small chunks AND verifying it is very difficult. It might work for some cryptocurrencies, but that's nothing we're interested in.
A proof of concept is more than welcome, but I don't yet know if anyone found a suitable task for this.
~f
Woozle Hypertwin
in reply to Codeberg • • •Woozle Hypertwin
in reply to Woozle Hypertwin • • •(on further thought) ...or is it?
Might that work? I guess there could be problems with trustability of the "unknown" answers -- does that look like the main issue to be solved?
Codeberg
in reply to Woozle Hypertwin • • •@woozle Remember that users want to get through the challenge page quickly. So the more samples you have, the simpler the individual problems need to be.
~f
Woozle Hypertwin
in reply to Codeberg • • •Jeff Grigg
in reply to Woozle Hypertwin • • •@woozle
I tend to think that if I had "plenty of free time to fight them," I'd dynamically identify which ones were bots, and then honor their requests, but also keep feeding them harder and harder problems to solve, making their costs "go through the roof" quite quickly. And maybe even give them misleading garbage data.
But the would be a lot of work, of course.
And it would be risky, as one might occasionally wrongly identify an actual valid real user.
Bredroll
in reply to Codeberg • • •Krzysztof Sakrejda
in reply to Codeberg • • •Dan Jones
in reply to Codeberg • • •ChatGPT Agent Passes CAPTCHA Test, Exposes Flaws in Bot Detection Systems
Simran Mishra (Analytics Insight)Cassandrich
in reply to Codeberg • • •argv minus one
in reply to Codeberg • • •These companies are evidently willing to pay an absolutely staggering cost to do their scraping.
I wonder, are they paying with their own money, or are they “borrowing” some unsuspecting strangers' compromised computers/routers/etc to do the work?
Trolli Schmittlauch 🦥
in reply to Codeberg • • •Sharlatan
in reply to Codeberg • • •Peter Cock
in reply to Codeberg • • •Sev negative one: huawei bound scrapers are bypassing Anubis
Xe (GitHub)Orca 🌻 | 🎀 | 🪁 | 🏴🏳️⚧️
in reply to Codeberg • • •bytefish.medium.com/gitcode-is…
[Chinese] cnblogs.com/gt-it/p/18271287GitCode, a code hosting platform, a joint venture of HuaweiCloud and CSDN (should be a blog service, basically a content farm now) [Chinese] qbitai.com/2023/09/85598.html
CSDN 大规模抓取 GitHub 上的项目到 GitCode,伪造开发者主页引公愤
gt-it (博客园)依云🦊
in reply to Codeberg • • •ozamidas
in reply to Codeberg • • •boy Huawei is so nasty
I wonder who are the biggest offenders on this matter...
Aleksandra Fedorova
in reply to Codeberg • • •"AI crawlers learned how to solve the Anubis challenges"
Why does EU discuss chat control and not AI crawlers control again?
p̷t̵r̴a̵c̷e̶
in reply to Codeberg • • •eBPF could be more effective and easy on the CPU, since it acts on a way lower network layer. Anubis kinda has it's limits and it's way too easy to circumvent (as you found out)
Maybe it's worth it to consider eBPF (if not already happened)
And thanks guys for your work. I'm a proud supporter and I'll continue to support your work. Companies shouldn't control the Open Source space
Sakura | 🅥 |
in reply to Codeberg • • •Akseli
in reply to Codeberg • • •ulveon.net
in reply to Codeberg • • •Anubis is extremely easy to bypass, you just have to change the User-Agent to not contain Mozilla, please get proper bot protection.
ulveon.net/p/2025-08-09-vangua…
This post talks briefly about other alternatives. Try Berghain, Balooproxy, or go-away.
Vanguard (Part 1): Telegram raid protection
Ulveon's ThoughtsCodeberg
in reply to ulveon.net • • •frphank
in reply to Codeberg • • •Tobias Hellgren
in reply to Codeberg • • •varx/tech
in reply to Codeberg • • •Have you looked into serving these LLM crawlers alternative versions of the site, with poisoned data? (And rate-limiting, of course.) I know it would be additional work for you to implement this, but... it might be effective.
I'm thinking you could have a precomputed set of 1000 different poison repos that get served up randomly, each of which is a Markov-chain-scrambled version of the files in a real repo.
(I wrote codeberg.org/timmc/marko to do something similar to the contents of my blog posts—a Markov model on either characters or words.)
marko
Codeberg.orgbkil
in reply to varx/tech • •varx/tech likes this.
Bradley Kuhn
in reply to Codeberg • • •😲🤬 re: what's happened to @Codeberg today.
The AI ballyhoo *is* a real DDoS against one of the few code hosting sites that takes a stand against slurping #FOSS code into LLM training sets — in violation of #copyleft.
Deregulation/lack-of-regulation will bring more of this. ∃ plenty of blame to go around, but #Microsoft & #GitHub deserve the bulk of it; they trailblazed the idea that FOSS code-hosting sites are lucrative targets.
giveupgithub.org
#GiveUpGitHub #FreeSoftware #OpenSource
Give Up GitHub - Software Freedom Conservancy
giveupgithub.orgSoftware Freedom Conservancy likes this.
Software Freedom Conservancy reshared this.
serk
in reply to Bradley Kuhn • • •@bkuhn if anyone need it, there is this gist showing how to pseudo-automate repository bulk deletion.
gist.github.com/mrkpatchaa/637…
and this tool
reporemover.xyz very handy
Bulk delete github repos
GistBradley Kuhn
in reply to serk • • •IMO, @serk, the better move is not to delete the repository, but to do something like I've done here with my personal “small hacks” repository:
github.com/bkuhn/small-hacks
I'm going to try to make a short video of how to do this, step by step. The main thing is that rather than 404'ing, the repository now spreads the message that we should #GiveUpGitHub!
GitHub - bkuhn/small-hacks: Give Up GitHub
GitHubBrett Sheffield (he/him)
in reply to Bradley Kuhn • • •@bkuhn @serk When @librecast moved our repos I wrote a script to wipe the GitHub repo and replace it with the #GiveUpGitHub README:
codeberg.org/librecast/giveupg…
giveupgithub.sh
Codeberg.orgreshared this
der.hans reshared this.
Brett Sheffield (he/him)
in reply to Brett Sheffield (he/him) • • •@bkuhn @serk spectra.video/w/mhoTtoSoXtkjan…
😉
bkil
in reply to Bradley Kuhn • •Rachael Ava 💁🏻♀️
in reply to Codeberg • • •Gabriel Sanches
in reply to Codeberg • • •Solinvictus
in reply to Codeberg • • •Codeberg
Unknown parent • • •@gturri Anubis sends a challenge. The browser needs to compute the answer with "heavy" work. The server then has "light" work and verifies the challenge.
As far as we can tell, the crawlers actually do the computation and send the correct response. ~f
Alex
in reply to Codeberg • • •RyanParsley
in reply to Codeberg • • •Shufei 🧮
in reply to Codeberg • • •Jes but like if she was sonic
in reply to Codeberg • • •kajer
in reply to Codeberg • • •taijidude
in reply to Codeberg • • •lemgandi
in reply to Codeberg • • •Daniel Lakeland
in reply to Codeberg • • •Spammy sources could be those that open new connections too often, transfer too many bytes, or have too many open active connections. All of those kinds of things can be accounted in nftables.
bkil
in reply to Daniel Lakeland • •Daniel Lakeland
in reply to bkil • • •Yuck. That does suck. But anything that makes a client stand out can be used to change that clients network priority. In this case perhaps to boost the priority of packets of clients who have initiated more than one connection in the last hour. For those in the questionable group, send them to underpowered servers that return web pages that say "click xyz to continue to your requested page" once you've identified a likely real client they get elevated packet priority and
@Codeberg
Daniel Lakeland
in reply to Daniel Lakeland • • •@bkil
Shuttled to a different back end server. The idea being to make the experience good for clients that act normal and low availability for clients that only connect once a week or once a day or etc. Plus make the questionable client run OCR and LLM on your click page to figure out how to get past it. Easy for a human, expensive for a bot.
Its obviously whack a mole. But if latency is 500-1000ms for bullshit clients and 50ms for your real clients then this is what you want.
@Codeberg
Daniel Lakeland
in reply to Daniel Lakeland • • •It means keeping an nftables IP set of millions of real customers but running a Linux server with gigs of ram as an edge router would make this fine. Even a billion real customers would need 16 gigs of RAM for ipv6 which is doable.
@Codeberg
Álex Sáez
in reply to Codeberg • • •Assimilateborg
in reply to Codeberg • • •A Fine Day to build a fence 🏳️⚧️🏳️🌈🇺🇦🇵🇸
in reply to Codeberg • • •Blain Smith
in reply to Codeberg • • •bit101
in reply to Codeberg • • •Mania Emma
in reply to Codeberg • • •mosher
in reply to Codeberg • • •muppeth
in reply to Codeberg • • •Watchful Citizen
in reply to Codeberg • • •