[AINews] ALL of AI Engineering in One Place • ButtondownTwitterTwitter

buttondown.email

Updated on May 23 2024


AI Twitter Recap

This section provides a recap of the AI Twitter highlights presented by Claude 3 Opus, summarizing the latest updates and discussions from various sources. It offers a quick overview of the trending topics and significant news in the AI community, helping readers stay informed about recent developments.

AI Model Releases and Benchmarks

Microsoft's Phi-3 Models

  • Microsoft released Phi-3 small (7B) and medium (14B) models under the MIT license, with versions up to 128k context. Phi-3 small outperforms Mistral 7B and Llama 3 8B, while Phi-3 medium outperforms GPT-3.5 and Cohere Command R+. They were trained on 4.8 trillion tokens and fine-tuned with SFT and DPO. Phi-3-vision with 4.2B parameters outperforms Claude-3 Haiku and Gemini 1.0 Pro V on visual tasks.
  • Many are eager to benchmark the Phi-3 models and potentially fine-tune them for applications. However, fine-tuning over a chat model can sometimes lead to worse performance than the base model.

Perplexity AI Partners with TakoViz for Knowledge Search

  • Perplexity AI partnered with TakoViz to offer advanced knowledge search and visualization, sourcing data from authoritative providers. Users can compare data like stock prices or lending over specific time periods and enable granular data queries across timelines. The partnership brings customer obsession and value to Perplexity users.

Artificial Intelligence Community Insights

Discussions in the AI community cover a wide range of topics, from Microsoft's advancements with Copilot+ to concerns about Perplexity's API lagging behind. Engineers address technical challenges with models like GPT-4o and Llama 3, while also discussing legislative impacts on the AI industry. Across various Discord channels, members share insights on model integrations, GPU preferences, and the need for better content curation in AI art. The community also highlights tools like Notie and SDXL Flash for enhanced AI capabilities. From Intel to OpenAI, the AI community remains abuzz with technical developments, regulatory concerns, and collaborative endeavors.

Discord Communities Highlights

  • Tech Talk and Role-Playing in OpenRouter Discord: The community delves into the role-playing tendencies and new vision models.
  • Interconnects Discord Troubles and Developments: Discussions on model releases, AI performance, and researcher titles.
  • OpenAccess AI Collective Updates: From Coherence integration to GPU finetuning complications, Axolotl is in the midst of significant advancements.
  • Latent Space Discussions: From Langchain JS to acquisitions, the community explores a variety of AI-related topics.
  • Tinygrad Discord Insights: Conversations range from trigonometric functions to training modes in the tinygrad community.
  • DiscoResearch Perspectives: Debates on model training strategies, development, and skepticism towards large-scale model releases.
  • Cohere Community Buzz: Cohere's hiring news, VRAM calculations, and bilingual bot integration quest.
  • AI Stack Devs Chat: Banter on emotional AI, 3D chatbots, and the influence of pop culture.
  • LLM Perf Enthusiasts Discord Update: Brief mention of a brevity issue discussion in the Llama3/Phi3 thread.
  • Mozilla AI Activities: Member-organized events, AMAs, and demonstrations within the Mozilla AI Discord server.

Detailed by-Channel Summaries and Links

Dataset Challenges:

  • Members discussed various dataset challenges including converting formats, using ShareGPT, and optimizing training parameters such as batch sizes. One user shared a frustrating experience of spending hours scraping a site into alpaca format, only to find it unhelpful.

Phi-3 Model Excitement and Skepticism:

  • Phi-3 models from Microsoft generated excitement with some members while others expressed skepticism about the validity of its benchmarks.

Conversations on AI Regulations:

  • Discussions about AI regulations such as California's SB 1047 law sparked debates about the implications for open-source models and meta's decision to not open the weights for its 400B model.

Technical Glitches and Workarounds:

  • Users noted common technical glitches, especially related to Colab and Kaggle updates breaking compatibility. Workarounds like restarting Colab sessions were suggested.

Unsloth Platform Developments:

  • Users discussed new model support on the Unsloth platform, including the excitement over improved fine-tuning features with the support of Mistral v3.

Extracting Villa Attributes from User Prompts

One member discussed the extraction of structured villa attributes from user prompts, emphasizing the importance of low latency and high performance. Synthetic data is being considered for evaluation. Another member shared their use case of predicting workflows and utilizing GPT-4 to generate them for different domains. They plan to fine-tune Mistral models using synthetic data. Additionally, there was a presentation on using LLM agents for user testing in web applications by tuning prompts for capturing user personalities and feedback. Another user proposed fine-tuning models to assist UK farmers and organizations in completing grant applications efficiently. Lastly, an idea was suggested for creating an in-store book recommendation system that offers suggestions based on user queries from a bookstore database, focusing on prompt engineering and potential fine-tuning for scalability.

Issues and Discussions on LLM Finetuning

The section covers various issues, discussions, and suggestions surrounding LLM Finetuning by Hamel and Dan. Topics include problems with Hugging Face models, clarification on Replicate's use case, conference registration issues, credit allocation queries, and more. The community also delves into the significance of evaluation steps, using different platforms for model running, dataset formats, and the need for practical examples in workshops. Helpful links and resources are shared to enhance understanding and provide guidance.

LLM Finetuning Discussions

LLM Finetuning (Hamel + Dan)

  • Excitement for Jason's Workshop:

    • Filippob82 expressed enthusiasm for Jason's session and mentioned they are halfway through his W&B course. They used an emoji to convey their excitement.
  • Curiosity about Prompt Engineering:

    • Nehil8946 showed interest in Jason's work on optimizing prompts and asked if there is a systematic approach to prompt engineering that Jason follows.
  • Hugging Face Presentation and Accelerate Resources:

    • A member shared various resources including a presentation on Hugging Face and documentation for Accelerate. Links included tutorials on FSDP vs. DeepSpeed and examples on GitHub.
  • Upcoming GPU Optimization Workshop:

    • An event was shared featuring a workshop on GPU optimization with speakers from OpenAI, NVIDIA, Meta, and Voltron Data.
  • Caching Precautions, Custom Callbacks, and Dataset Types:

    • Users discussed precautions for training multiple models, using custom callbacks, and details on dataset types 'pretrain' and 'completion.'
  • Command Errors and GCC Installation Resource:

    • Discussions on unresolved issues with running commands and a helpful link for installing the GCC compiler on Ubuntu.
  • Perplexity Partners with Tako for Advanced Knowledge Search:

    • Perplexity teams up with Tako to provide advanced knowledge search and visualization, initially available in the U.S. and English.
  • Service Downtime and Copied Features:

    • Discussion on Perplexity's downtime causing speculation and frustration among users, also shared a blog post stating Microsoft copied features from OpenAI.

Eleuther Thunderdome

Questions on lm-evaluation-harness and MCQs

Members discussed the randomization of answer choices in MCQs using lm-eval-harness, with concerns about benchmark biases towards early choices. While SciQ has a fixed correct answer index, the randomization isn't currently applied for MMLU.

  • Upcoming Submissions and Papers: An anon'd paper is coming soon to arXiv, while members joked about not needing to worry about insane competition in D&B papers. There's also work on an updated version of the Pile with 3T tokens and fully licensed text.
  • Medical Benchmarks Controversy: A lively discussion emerged about medical benchmarks and their potential dangers. One member focused on how these benchmarks might claim models are better and safer than physicians, highlighting ongoing improvements in the interpretation of such benchmarks.
  • Huggingface Dataset Configuration: Members sought advice on configuring a Huggingface dataset's directory structure. The solution pointed out the importance of adding a config in the README.md file as outlined in the Manual Configuration.
  • Running lm-eval-harness on Multi-node Slurm Cluster: A question was raised about evaluating big models on a multi-node Slurm cluster. Attempts have been made using vllm + ray and accelerate but were unsuccessful, indicating a need for better solutions.

Links mentioned:

HuggingFace Discussions

The web page showcases various discussions and projects within the HuggingFace community. Topics include integrating models like NeRF and 3D Gaussian Splatting, challenges in fine-tuning Falcon-180B, embedding issues with Llama-8B, GPT deployment on personal websites, and discussions on AI regulation laws. Additionally, projects like adding ImageBind to Transformers, training Huggy Agent, seeking project collaborators, exploring 3D Gaussian Splatting, using Evaluator classes for evaluation, automating tweets from Wiki articles, and the TransAgents framework for literary translation were shared and discussed. There are also mentions of markdown note-taking app, Dockerized wiki, a typography image dataset, NorskGPT-Llama3-70b model release, and the SDXL Flash tool for DALL·E 3 level images generation. Lastly, topics in the NLP and computer vision channels involve finetuning OwlV2 on custom data, a master thesis on hallucination detection using Mistral 7B, considerations on chat history in LLMs, and the introduction of llmcord.py for structured conversations with bots.

Hardware and System Setup Discussions in LM Studio

Hardware and System Setup Discussions:

  • Users discussed running large models on multiple GPUs, specifically 70b models. A user shared their experience with performance improvements using NVLink but highlighted challenges with multi-GPU setups and VRAM requirements.
  • Discussions around LM Studio supporting dual GPUs of the same type, not mixed types like AMD and Nvidia. Automatic GPU recognition and matching VRAM capacities were advised for smooth operation. Additionally, configuration settings control GPU split were suggested for balancing GPU usage.
  • Issues were raised about using multiple Intel ARC GPUs with LM Studio, with uncertainty on AMD GPU support for multiple GPU setups. Community members appreciated the support received and encouraged utilizing Discord's search function for quick answers.

Nous Research AI ▷ #ask-about-llms

Home setup with 4090s rules:

One user hosts LLMs for inference at home on a setup with 2x 4090s.

Runpod and Replicate get nods for ease:

Runpod is recognized as a good option, and Replicate is praised for its easy-to-use templates, making them convenient platforms for hosting LLMs.

LambdaLabs is cheapest but tougher:

While LambdaLabs offers the cheapest GPU options, they are reportedly more difficult to use compared to other platforms.

Anthropic Workbench woes:

A member inquired about issues with Anthropic Workbench, questioning if the problem is widespread or isolated.

Modular (Mojo 🔥) Nightly

The section discusses various topics related to the Modular (Mojo 🔥) Nightly channel. Members shared insights on issues like a commit by chris lattner causing a DCO test suite failure, delays in the nightly release due to GitHub Actions issues, and propositions for adding Unicode support in strings. Additionally, there were discussions on ongoing changes and function updates in modules like renaming 'math.bit' to 'bit' and 'bswap' to 'byte_reverse'. Various links were shared for further reference on the discussed topics.

LAION ▷ research

Experiment with xLSTM sparks curiosity:

A member inquired if anyone had experimented with xLSTM yet. This seems to indicate growing interest in less mainstream models.

Meta paper brings familiar yet improved content:

Members reviewed a Meta paper, noting it closely relates to earlier cm3leon research but with enhancements. They highlighted interesting advancements in attention mechanisms for scalability.

KANs get reviewed:

A member shared a review of KANs (Kernel Attention Networks), saying, 'Take that KANs', alongside a link to the review.

Phi-3 Vision chat drives detailed exploration:

Discussion revolved around the Phi-3 Vision multimodal model from Microsoft, with documentation resources included for deeper insight. One user noted how GPT-4 generated charts sorted by color without changing order, leading to a debate about its purpose.

Anthropic scaling paper is heavy reading:

Members talked about the dense content of a recent Anthropic paper. There was a noted absence of conversations around its implications until now.

Link mentioned: microsoft/Phi-3-vision-128k-instruct · Hugging Face

OpenAccess AI Collective Updates

The section provides updates from the OpenAccess AI Collective's Axolotl channel, including discussions on incorporating Cohere into the system, solving tokenizer confusion, finding a tiny Mistral model, progress on a distillation pipeline, and identifying a faster STT to LLM library. The diverse discussions cover various aspects of AI development and collaboration within the Collective.

AI Stack Devs (Yoko Li) - AI Companion

In this section, users engage in light-hearted banter with one joking that AI waifus save lives, leading to a response of 'Just Monika.' Additionally, there is a discussion about embedding emotional AI in business bots, with a member questioning the potential for waifus to understand and process emotions. Another member mentions working on 3D character chatbots at 4Wall AI and shares a teaser available on a specific channel.

Datasette - LLM (@SimonW)

Qualcomm unveils Snapdragon Dev Kit for Windows priced at $899.99, an alternative to Mac Mini with 32GB RAM and 512GB storage. Users discuss pricing concerns, using Mac Mini as a server, desire for cheaper dev kits, and an experiment with Smalltalk. Links mentioned provide more details on the dev kit and Windows Dev Kit 2023.


FAQ

Q: What models did Microsoft release under the MIT license?

A: Microsoft released Phi-3 small (7B) and medium (14B) models under the MIT license, with versions up to 128k context.

Q: How were the Phi-3 models trained and fine-tuned?

A: The Phi-3 models were trained on 4.8 trillion tokens and fine-tuned with SFT and DPO.

Q: What is the significance of Perplexity AI partnering with TakoViz?

A: Perplexity AI partnered with TakoViz to offer advanced knowledge search and visualization, sourcing data from authoritative providers.

Q: What topics are discussed in the AI community according to the essai?

A: The AI community discusses a wide range of topics from technical advancements with models like Copilot+ to legislative impacts on the AI industry.

Q: What challenges were discussed regarding dataset handling?

A: Members discussed challenges such as converting dataset formats, using ShareGPT, and optimizing training parameters like batch sizes.

Q: What are some examples of conversations on AI regulations mentioned in the essai?

A: Discussions include California's SB 1047 law implications for open-source models and meta's decision regarding the weights of its 400B model.

Q: What are some common technical glitches discussed in the community?

A: Users noted common technical glitches related to Colab and Kaggle updates breaking compatibility, with workarounds like restarting Colab sessions.

Q: What developments were discussed regarding the Unsloth platform?

A: Discussions included new model support on the Unsloth platform, improved fine-tuning features with Mistral v3, and considerations for structured data extraction.

Q: What conversations were had about lm-evaluation-harness and MCQs?

A: Members discussed randomization of answer choices in MCQs using lm-eval-harness and concerns about biases towards early choices.

Q: What hardware and system setup discussions were highlighted in the essai?

A: Discussions included running large models on multiple GPUs, issues with mixed GPU types, and configurations for smooth operation.

Logo

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!