Disclaimer: I am not responsible for anything the material or the included code/patches may cause, including loss of data, physical damage, service disruption, or damage of any kind. Use at your own risk!
Background: Most linux distributions don't allow more than 32 groups/user. That means one user cannot belong to more than 32 groups. Unfortunately, this limit is hard coded into the linux kernel, glibc, and a few utilities including shadow.
1. Patching the kernel
I only tried this on 2.4.x kernels. However, things should be the same with 2.2.x. Be careful when choosing the new limit. The kernel behaves strangely with large limits because the groups structure per process is held on an 8K stack which seems to overflow. A 2.4.2 kernel with a limit of 1024 crashed during boot. However, I successfully used a 255 limit on a 2.4.8 kernel.
The group limit is set from two header files in the linux kernel source:
This file should contain something like this:
#define NGROUPS 32
Simply replace 32 with the limit you want. If your param.h doesn't contain these lines, just add them.
Look for a line that looks like
#define NGROUPS_MAX 32
and change the limit.
Now the kernel must be recompiled. There are some howtos that explain how this is done.
2. Recompiling glibc
This applies to glibc-2.2.2 (this is the version which I used). It may also apply to other versions, but I didn't test it.
__sysconf function in glibc is affected by the limits defined in the system header files. Other functions (initgroups and setgroups) in glibc rely on
__sysconf rather than using the limits defined in the header files. You'll have to modify two header files. Please note that this limit will be used by glibc and all programs that you compile. Choose a reasonable limit. However, it's safe to use a larger limit than you used for the kernel. I successfully compiled and ran glibc with a limit of 1024.
Make sure the file contains something like
#define NGROUPS 1024
It should contain a line like this:
#define NGROUPS_MAX 1024 /* supplemental group IDs are available */
Now you have to recompile glibc. I hope there are some howtos that explain how this is properly done. I only did it twice and I got into trouble both times. Glibc compiles cleanly, but the actual problem is installing the new libraries. A 'make install' won't do it, at least not with bash (some people suggested it would work if I used a statically linked shell, but I didn't try). This happened on RedHat, where the distribution glibc was placed under a subdirectory of /lib rather than directly under /lib. 'make install' copies libraries one at a time. After glibc is copied, paths stored inside the new glibc binary won't match those from the old ld-linux.so, causing ld not to be able to dinamically link any program. So 'make install' won't be able to run /usr/bin/install, which is needed to copy the new binaries, and it will fail. I had to reset the machine (/sbin/shutdown could no longer be run), boot from a bootable cd and manually copy glibc-2.2.2.so, libm-2.2.2.so, and ld-2.2.2.so, sync and reboot. Then, everything seemed to be normal.
3. Recompiling shadow utils
Before I recompiled glibc, I had manually put a user in more than 32 groups (that means it already belonged to 32 groups and I manually modified /etc/groups). Proper permissions were granted for groups above 32, but usermod failed to add a user to more than 32 groups. I began browsing the shadow utils source and found that it uses the system headers at compile time to set the limit. This means that it had to be recompiled, because the old limits were hard coded into the binaries. A simple recompilation will do. However I made a patch against shadow-20000826, that will dinamically allocate space for the group structure using __sysconf(). This means it won't have to be recompiled if glibc is recompiled with a different limit.
4. Fixing process tools
Once again I thought everything was fine. However I ran apache webserver as user www1, which belonged to more than 100 groups (that was a security measure for massive virtual hosting). The message 'Internal error' appeared (apparently) at random while running different programs. After a few grep's I figured out the message came from libproc. I began browsing the procps sources and found a terrible bug. Process information is read from the kernel and concatenated into one string which is then parsed to get a dynamic list. The problem is that the string was blindly dimensioned to 512 bytes, which was not enough to hold information for so many groups. I made a patch against procps-2.0.7, which only defines a symbolic constant in readproc.c and allocates the string with the size given by that constant. Of course, I used a larger value, such as 4096. You'll have to apply this patch and recompile procps.