NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] CogVideoX: Zhipu's Open Source Sora • ButtondownTwitterTwitter

buttondown.com

Updated on August 28 2024

Chapters

AI Twitter Recap
AI Discord Recap
OpenAccess AI Collective (axolotl) Discord
Dspy Discord
Exploring Distributed Trees of Experts, Flux on Apple Silicon, Self-hostable LLMs, Moondream2 Real-time Demo, and Finetuning Practices
AI Discussions in OpenAI and HuggingFace
Open Source AI Projects and Community Discussions
Gemini, Flux, ZLUDA, and AI Hardware Developments
NVIDIA GPU Measurement Tools and Metrics Discussion
Older GPUs and BitBlas Performance
OpenRouter and AI Development
OpenAccess AI Collective (axolotl) Discussions
DSPy General
Interest in DSPy for Text Scoring

AI Twitter Recap

This section provides a recap of AI-related discussions and developments on Twitter. It covers updates on AI models like Llama 3, Moondream, Phi-3.5, and Together Rerank API. Additionally, insights on AI research, techniques like superposition prompting and long-form content generation, and tools like Not Diamond and AI in command line interfaces are highlighted. The section also includes discussions on background removal with WebGPU, AI industry practices such as AI hiring at Midjourney, and information on hyperscaler Capex.

AI Discord Recap

The AI Discord recap covers various discussions and developments in the AI community across different Discord channels. Topics range from AI model advancements and releases to ethical concerns, decentralized AI, reasoning capabilities of models, challenges with YouTube summarization tools, and more. Each section highlights the ongoing conversations and activities within the AI community, reflecting the diverse interests and opinions of individuals involved in the field.

OpenAccess AI Collective (axolotl) Discord

Evaluating lm eval metrics: For multiple choice questions, the metric used is accuracy on target prediction, determining if the model's highest logit aligns with the correct choice.

Members highlighted nuances in model evaluations, discussing scenarios where answers differ slightly.

Confusion surrounds Tokenizer v3 specs: Members expressed confusion regarding tokenizer v3, with links to previous discussions on the nemo repo shared.

There was a consensus on needing proper configuration for supporting multi-role functionalities.

Deepseek V2 monkey-patch insights: Members discussed using monkey-patching to override the forward method for the Deepseek V2 attention model, sharing relevant code snippets.

An experience comparison was made about monkey-patching in Java versus Python, showcasing complexities in the implementation.

FSDP's RAM resource requirements questioned: Concerns regarding whether FSDP (Fully Sharded Data Parallel) requires significant system RAM for effective functioning were raised.

This led to discussions about optimal system resources necessary for operating FSDP effectively.

AI Ratings vs Human Ratings Unpacked: A member utilized llm-as-judge for rating and questioned the accuracy of AI judgments compared to human ratings.

Further inquiries were made about any conducted tests evaluating this accuracy, emphasizing the need for metric validation.

Dspy Discord

LAION-aesthetic link issues:

A member reported that the link to LAION-aesthetic on the LAION website is broken and requested an alternative link from Hugging Face.
- Any updates on a working link would be greatly appreciated, highlighting the ongoing community need for reliable resources.

Request for functional LAION-aesthetic resource:

The discussion emphasized the importance of having a functioning link to LAION-aesthetic, essential for users accessing data models.
- Members expressed frustration over the non-functional website and urged for prompt solutions to improve usability.

Exploring Distributed Trees of Experts, Flux on Apple Silicon, Self-hostable LLMs, Moondream2 Real-time Demo, and Finetuning Practices

Exploring Distributed Trees of Experts:

A member inquired about distributed trees of experts, comparing it to supporting shared AI workloads in a P2P network. They discussed community-driven development enhancing collaboration in AI projects.

Need for Flux on Apple Silicon:

A member sought an mlX implementation of Flux for Apple Silicon. Integration efforts have been hampered by the lack of a suitable solution.

Seeking Self-hostable Multimodal LLM:

Interest was expressed in a self-hostable multimodal LLM for real-time video analysis without specific training. Concerns about cost and privacy were raised, exploring options like GPT-4(o).

Moondream2 offers promising solutions:

Recommendation was given to explore Moondream2, featuring a real-time webcam demo with easy fine-tuning capabilities suitable for self-hostable multimodal LLM needs.

Debate on Finetuning Data Sources:

A discussion arose regarding the use of data generated by another model in the finetuning process. Considerations were made about the risks and benefits, especially when derived from a stronger model.

AI Discussions in OpenAI and HuggingFace

The section discusses various AI-related topics including AI personhood debate, emotional understanding in AI, decentralization of AI data use, future AI impact on society, frustrations with GPT-4o's reasoning, AI voice synthesis business ideas, and challenges with YouTube summarization tools. The exploration in HuggingFace involves model deployment challenges, runtime errors, training AI model issues, using Sentence Transformers, and converting model formats.

Open Source AI Projects and Community Discussions

The section highlights various open-source AI projects and developments within the community. It includes the launch of the StockLlama forecasting model, exploration of quantized models for vulnerability assessment, the release of the RYFAI app as an open-source AI tool, and experiences shared on creating AI voice assistants using Raspberry Pis. Members are encouraged to try out these tools and contribute to their development. Additionally, discussions in HuggingFace groups cover topics such as channel etiquette, a significant paper on HuggingFace models, and the release of the Cog-5B video model. The section also includes conversations on text-summary models, Llama3.1 for synthetic data, and the architecture of ProteinBERT. Moreover, there are comparisons and discussions related to hardware choices for LLMs, inference speeds, and the introduction of the Tinygrad framework. Updates on Aider v0.53.0 highlight improvements in prompt caching, new command options, error handling, and cache warm-keeping features.

Gemini, Flux, ZLUDA, and AI Hardware Developments

Users engaged in discussions about various AI models and developments in the tech industry. The topics included comparing Aider's capabilities with existing models, the performance of new Gemini models, prompt caching importance, and OpenRouter performance issues. Additionally, members shared insights on scripting with Python in Aider, command line options, data security, and commit message generation challenges. The section also highlighted releases of system prompts by Anthropic, experimental Gemini models by Google, and rate limits for Gemini models. Furthermore, users discussed upcoming releases of new AI hardware like Intel CPUs and NVIDIA GPUs, the impressive capabilities of Flux models, concerns over ZLUDA development, integration of SD Next with ZLUDA, and challenges with Streamdiffusion and SDXL Turbo. The section also touched on topics like video benchmark examples, RLHF libraries, and a free API for running Llama 3.1 by SambaNova.

NVIDIA GPU Measurement Tools and Metrics Discussion

A discussion on NVIDIA Power Utilization, GPU Hardware Measurement Tools, WandB System Metrics, and PyTorch Profiler Insights was held. Topics included the effectiveness of NVIDIA-smi for measuring GPU power utilization, distinctions between GPU utilization and power draw metrics, easy utilities for measuring GPU power draw like pynvml, and using PyTorch profiler for accurate GPU utilization metrics. Links to relevant resources were also shared.

Older GPUs and BitBlas Performance

BitBlas demonstrates good performance on older GPUs like the 2080 Ti, although fullgraph compilation does not work on these devices. Users find the lack of support for fullgraph compilation to be a notable limitation despite BitBlas functioning on older GPUs. Another user achieved ~340000 tokens/s with an H100 NVL GPU but utilized only 26GB of memory, prompting suggestions to fully utilize VRAM by adjusting batch size or disabling recomputation. Discussions also revolve around resuming training from checkpoints and exploring low-rank fine-tuning methods. Additionally, there are discussions on compile-time programming in C++/CUDA, Zig's compile-time capabilities, adopting constexpr for cleaner code, Cutlass's compile-time techniques, and defining a half type in C++. Overall, the performance of BitBlas on older GPUs as well as optimizations for GPU usage are the primary focus in this section.

OpenRouter and AI Development

The section discusses various topics related to OpenRouter and AI development. It covers team efforts recognition and highlighting AI collaboration on Twitter. There is a detailed conversation about OpenRouter model pricing, DisTrO distributed training innovation, Cerebras competitive pricing, OpenRouter context caching for DeepSeek, and exciting updates for Gemini models. Additionally, links and discussions about these topics are also included.

OpenAccess AI Collective (axolotl) Discussions

lm eval metrics depend on benchmark: Members discussed scenarios where answers might be slightly different, highlighting the nuances in evaluating model outputs.
Questions arise about tokenizer v3: Multiple members expressed confusion about tokenizer v3, with one linking to a previous discussion on the nemo repo.
Mistral's initial config errors: A member noted that Mistral had issues with their tokenizer_config.json during its initial release, stressing the importance of accurate configuration.
Demand for masking functionality in Jinja: Conversations about how masking should work within the multi-role context were discussed.
Understanding multi-role and masking effects: A member clarified that in ShareGPT, masking can be specified for input, allowing certain roles to be ignored during training.

Links mentioned:

Conversation – Axolotl: no description found
axolotl/src/axolotl/prompt_strategies/chat_template.py at 17af1d7081414c32614cbabe324e1197ca9f43a7 · axolotl-ai-cloud/axolotl: Go ahead and axolotl questions. Contribute to axolotl-ai-cloud/axolotl development by creating an account on GitHub.

DSPy General

Fixing DSPy Output Truncation:

A member reported their outputs getting truncated when using DSPy and suspected token limits might be to blame. Another member suggested changing max_tokens during initialization and using [your_lm.inspect_history()](https://some.link/here) to view the prompts.

The original poster confirmed that this advice resolved their issue, highlighting the practical help from the community.

Error in DSPy Import:

A member encountered the error message module is installed, but missing library stubs or py.typed upon importing DSPy. They inquired whether DSPy supports typing in Python, indicating a need for clearer documentation.

Interest in DSPy for Text Scoring

A user inquired about using DSPy to score generated texts based on KPIs or industry metrics like BLEU or ROUGE, showcasing a rising interest in evaluating text generation performance metrics within the community.
However, there were no responses or shared experiences from other members regarding scoring texts with DSPy.

FAQ

Q: What is the significance of Monkey-patching in Deepseek V2?

A: Monkey-patching in Deepseek V2 involves overriding the forward method of the attention model, allowing members to customize and enhance the model's functionality.

Q: What were the concerns raised about FSDP's RAM resource requirements?

A: Concerns were raised regarding the amount of system RAM needed for effective functioning of Fully Sharded Data Parallel (FSDP), sparking discussions on optimal system resource allocation.

Q: How was LAION-aesthetic link issues reported in the community?

A: A community member reported a broken link to LAION-aesthetic on the LAION website, prompting a request for an alternative link from Hugging Face to meet the community's need for reliable resources.

Q: Why was there a debate on finetuning data sources among members?

A: A discussion arose regarding the use of data generated by another model in the finetuning process, with considerations about the risks and benefits, especially when derived from a stronger model.

Q: What is the focus of discussions regarding BitBlas performance on older GPUs?

A: Discussions focus on the performance of BitBlas on older GPUs like the 2080 Ti, noting limitations such as the lack of support for fullgraph compilation and considerations on optimizing GPU usage.

Q: What prompted the need for clarity in understanding multi-role and masking effects in ShareGPT?

A: Members discussed how masking should work within the multi-role context, with a specific emphasis on specifying masking for inputs to ignore certain roles during training in ShareGPT.

Q: What issues were encountered by a member using DSPy and how were they resolved?

A: A member reported outputs getting truncated in DSPy, suspected token limits, and sought help to resolve the issue by changing max_tokens during initialization and viewing prompts with your_lm.inspect_history(). This demonstrated the practical support from the community.

Q: What error message did a member encounter when importing DSPy and what further inquiry was made?

A: A member faced the error 'module is installed, but missing library stubs or py.typed' when importing DSPy, inquiring about the support for typing in Python and the need for clearer documentation. Another member raised interest in using DSPy to score generated texts based on industry metrics like BLEU or ROUGE, reflecting the community's focus on evaluating text generation performance metrics.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo