Post by PaulWith the end of Windows 10 coming up soon, I have 2 computers that are working perfectly OK but which are not upgradable to 11.
Thinking of possible uses, is there any way an old W10 box could be strapped down so that it can exist on a local network as a file server for other devices on the local net but not respond to or do anything else, so that it could be safely left unsupported.
Or I suppose I'll have to try Ubuntu on them as I have in previous obsolescences.
I've got a laptop which technically can be upgraded but doesn't have enough space. It has a 128 GB SSD which is nearly full. It doesn't have a slot for a memory card, so I can't upgrade it that way.
Windows 10 uses 20 GB; Windows 11 uses 64 GB, and needs an extra 20 GB free to do the upgrade. Why does Windows 11 need more than three times the space? Is it three times as good? What would that even mean?
I suspect that the dangers of using a machine which isn't upgradable are exaggerated.
I'm thinking of getting a mini PC, but I would like one that can run a LLM (Large Language Model) locally, if I can master the technicalities of AI. It appears to be rather a closed book. And you can't buy a PC that says it's LLM ready (with a suitable GPU); only ones that say they are suitable for gaming.
Caveat: I don't know anything about LLMs, have not run one,
am not buying hardware to run them. Too expensive for
equipment that can run a reasonable cross-section of models.
For the wish to run an LLM AI right now, you can.
It might not have voice synthesis though. This experience will be
a lot better, than the Excel spreadsheet LLM someone built :-)
https://www.techrepublic.com/article/news-microsoft-bitnet-small-ai-model/
"However, BitNet b1.58 2B4T still isn’t simple to run; it requires hardware
compatible with Microsoft’s bitnet.cpp framework. Running it on a
standard transformers library won’t produce any of the benefits in terms of
speed, latency, or energy consumption. BitNet b1.58 2B4T doesn’t run on GPUs,
as the majority of AI models do."
"In the research paper, which was posted on Arxiv as a work in progress, the
researchers detail how they created the bitnet. Other groups have created
bitnets before, but, the researchers say, most of their efforts are either
post-training quantization (PTQ) methods applied to pre-trained full-precision
models or native 1-bit models trained from scratch that were developed at a
smaller scale in the first place. BitNet b1.58 2B4T is a native 1-bit LLM
trained at scale; it only takes up 400MB, compared to other “small models”
that can reach up to 4.8 GB." [You can run it on a toaster...]
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
It's something that runs on a CPU, at a guess. It's supposed
to run in a relatively small memory, which means it won't need
MMAP operation, nor should it wear out your NVMe via paging.
With a 4096 token limit, it can't do really serious work, but it should
still have some of the behaviors of a large model.
*******
For standard models (using bigger number formats than the -1,0,1 model above),
even with MMAP, you might want a machine with 1TB-2TB of *RAM*, plus a
good video card. For example, a RTX5090 equipped with 96GB of GDDR
(with enterprise pricing!), can use a group-of-experts model, where,
say, a 35GB model is loaded fully into the video card, and can more
efficiently process your question. By using a single video card,
there are no bandwidth restrictions that result from using a multitude
of smaller video cards. Using a big video card can be 7x faster for
some things, because it does not need to do PCIe-to-PCIe DMA for transfers.
It might take a series of those 35GB models to load, one at a time,
from main memory. If the modules fit into system RAM, then you might
not need to modify a copy of the model stored on an NVMe stick.
That might involve sixteen thousand worth of equipment. Whereas the
tiny model above, can run on a toaster. More CPU cores are going to help
in this case. I don't think it is single threaded. It's unclear whether
it uses your whole machine, if you have a lot of cores for it to use
*******
"Running BitNet b1.58 on Raspberry Pi (Install Guide & Testing) (9 days ago)"
http://youtu.be/3q_ItuNNpmY
"How to run microsoft bitnet-b1.58-2B-4T locally on your laptop"
http://youtu.be/iNTFobSRt0Q
*******
At the current time, the most capable dedicated-NPUs are on laptops. Presumably
the feeling is, that video cards are going to have a lot more
TOPS to offer than a tile put inside the CPU package. The RTX5090 is
1000 TOPS, with the caveat that performance varies with numerical format.
the -1,0,1 model above (trit) is not currently directly supported
on a video card. There is no optimal hardware for that. But with the
right massaging, one of the other files Microsoft released, might work
on a GPU. (One of the other files doesn't use trit.)
Current video cards, on purpose, only have "small RAM". That's not
an accident. The two video card companies do not want to damage
their market for really expensive cards. And putting four
16GB cards in a PC, would not give the speedup you might like.
The video card VBIOS, has crypto signing to control how much
RAM it will use. You cannot solder different chips to a video
card and magically get it to work. You need the correct VBIOS,
and it is likely the GPU is marked at the factory in some way,
regarding what VBIOS it will accept for configuration. This prevents
third-parties from re-purposing restricted hardware.
Thank you. Explanations and instructions are always so complicated, even
for the moderately technical.