Five AI Trends from the latest NeurIPS 2023 Conference

The
Neural Information Processing Systems (NeurIPS) 2023
conference, one of the largest AI gatherings, showcases the current trends in
AI. This year's conference was no exception, with the dominance of Large
Language Models (LLMs) on full display. While NeurIPS focuses on LLMs, it also
indirectly highlights significant developments from other AI conferences. To
validate the trends, we spent over 100 hours researching
keynotes,
tutorials,
workshops,
and
oral presentations
to come up with a summary that we feel represents the latest on the most
dominant technologies of GenAI. Here's an overview of the top five AI
trends:

1. Open-source models make LLMs accessible to all

Researchers aim to reduce the cost and the massive resource requirements for
training LLMs, making them more accessible to academia and smaller
organizations. Open-source LLM projects like LLAMA-2
and LLM-360 aim to democratize access to these models.
LLAMA-2 is "free", meaning that their weights are publicly available
but the training data and the training process are not released. This is
equivalent to distributing the binary code of a program but not the source code.
The training (or better the pretraining) of an LLM looks like an experiment and
it has to be documented very thoroughly so that others can reproduce it. LLM-360
is a project in this direction and it is considered as a true open-source
project.

2. Data challenges are being solved with smaller, deeper datasets and knowledge graphs

The conference extensively discussed the data issue, highlighting a consensus
that high-quality data such as books and curated publications are running
scarce. Additionally, the potential of alternative data sources like code and
simulators, which operate as knowledge graphs (KGs)
was explored. These can significantly enhance performance, particularly in tasks
involving reasoning, planning, and social/emotional understanding. KGs offer
deductive reasoning through rules that can model any environment with a web of
logical rules that are reactive and recursive. Think about a huge excel
spreadsheet. A KG models the dependencies between different entities and values
as a graph. A KG can represent anything from differential equations to
business rules.

Researchers recently proved that tiny-LLMs (a few millions of parameters) can
have very good performance if they are trained with the right dataset. Because
they are so small that they can be trained in a few hours with modest hardware
resources. Experienced data scientists can now experiment with crafting the
training data and getting a better understanding of how to prepare them. This is
exactly what they used to do 10 years ago with feature engineering. The rise of
tiny LLMs offers a promising way to distill datasets.

An emerging trade-off exists between using vast amounts of data to train huge
models automatically versus reducing scale, which requires more investment in
engineers and dataset curation but less on training. Let’s compare the cost
which also reduces the carbon footprint which is of great ethical value for the
researchers: in terms of carbon emissions a good data scientist has a much
smaller footprint than a cluster of GPUs that can consume the electricity of a
city for a month, just saying!

3. Software 3.0 paradigm: program smaller LLMs instead of training a huge LLM

The concept of Software 3.0 proposes the departure from training a single LLM
towards building larger models by combining smaller, specialized LLMs. This idea
draws parallels to traditional software development practices, where complex
projects are built from smaller components.
Karpathy's Software 2.0 paradigm,
introduced in 2017, likened training deep learning models to compiling software.
The approach suggests training numerous specialized LLMs and combining them
using a
task vector,
similar to compiling separate source files in software development. A notable
inspiration for this approach comes from
Word2Vec
(Test of Time award for NeurIPS 2023), which demonstrated the ability to
represent words as numerical vectors, allowing algebraic operations on them to
derive semantic relationships. For example you could take the vector
representation of "King" and add the vector representation of
"woman". That would result in the vector that represented the word
“Queen”.

Similarly, recent findings show that LLMs can be combined by adding or
subtracting their embeddings to perform tasks like translation or filtering
toxic language. Let's say we trained LLM A to translate from english to
greek and LLM B to translate from greek to italian. By adding the embeddings of
A and B we get LLM C that can now translate from english to italian! One more
example: we trained LLM A on customer service data and LLM B on a dataset that
has a lot of toxic language. By subtracting B from A we get LLM C that does
customer service but with less toxic language!

By training and combining hundreds of thousands smaller LLMs specialized in
various tasks and indexed with a task vector for faster retrieval, Software 3.0
aims to achieve more versatile and efficient models, mirroring the modular
approach of traditional software development.

4. LLMs plus KGs equals both the creative and the rule follower

Despite the recent advancements
(ChatGPT scores in the 89th percentile in SAT)
and other specialized models like
MINERVA
and
LLEMA,
it is impossible to move forward without theorem provers (TPs). TPs are systems
for encoding programmatically proof tactics, axions, and premises, and can be
used in domains such as legal, or any other domain that embodies reasoning. In
that sense, modern KGs are general forms of theorem provers. What we saw in the
conference is that TPs are the ultimate companion of LLMs when it comes to
synthesizing new knowledge.

Here is how it works: extracting reasoning paths from unstructured text involves
converting mathematical proofs expressed in free text and symbols into a
structured "computer program" format. LLMs excel at this task,
automating a process that was previously manual. This process essentially
creates a KG for mathematics, capturing sequences of reasoning rather than just
facts. Formal proofs extracted from this KG can be used to retrain LLMs,
enabling them to reason and potentially prove new theorems.

By nature, a LLM is designed to be probabilistic, making it a creative system
that can dream new reasoning paths. But in order to make valid outputs, we need
the TPs ( a.k.a. KGs) to determine if the "dream" is valid or not.

5. Finally, LLMs can be pretrained on relational tables - meeting enterprises where their data sits

LLMs are trained on big text collections, but the majority of enterprise data
sits on relational tables. A reasonable question is whether we can pretrain
language models on relational tables.

When it comes to tables with numerical and categorical data, LLMs can be trained
to do predictive tasks. LLMs trained on millions of tables can classify a
feature vector instantly if we give it context with up to 10,000 training
examples. This behavior can be generalized to multiple tables that contain
facts. LLMs can ingest the tables and predict facts that do not exist in the
original tables. This has been traditionally done with Graph Neural Network
(GNNs), but transformers (designed for sequential data processing, particularly
well-suited for natural language processing tasks) seem to be catching up.

The choice here is to either modify the architecture of general purpose LLMs
that are trained mainly in text to accommodate text, or find a different way to
train general purpose LLMs with tables represented in a text form.

View the full presentation:
"NeurIPS 2023 Trends in AI".

About the Author

Nikolaos Vasiloglou is the VP of Research-ML at RelationalAI. He has
spent his career on building ML software and leading data science projects in
Retail, Online Advertising and Security. He is a member of the
ICLR/ICML/NeurIPS/UAI/MLconf/KGC/IEEE S&P community, having served as an
author, reviewer, and organizer of workshops and the main conference. Nikolaos
is leading the research and strategic initiatives at the intersection of Large
Language Models and Knowledge Graphs for RelationalAI.

About RelationalAI

RelationalAI is the industry's first AI coprocessor for data clouds and language
models. Its groundbreaking relational knowledge graph system expands data clouds
with integrated support for graph analytics, business rules, optimization, and
other composite AI workloads, powering better business decisions. RelationalAI
is cloud-native and built with the proven and trusted relational paradigm. These
characteristics enable RelationalAI to seamlessly extend data clouds and empower
you to implement intelligent applications with semantic layers on a data-centric
foundation.