湾区同学技术沙龙
(Bay Area) Tachyon: an open source memory-centric distributed storage system (Bin Fan / Shaoshan Liu / Haoyuan Li)
19 July 2015
1:30PM ~ 3:40PM, 07/19/2015, Sunday
Registration
- Registration link: tech-meetup-12-13-2015.eventbrite.com/
- Event link: Tachyon: an open source memory-centric distributed storage system
Event Info
- Language: Chinese
- Time: 1:30PM ~ 3:40PM, 07/19/2015, Sunday
- Location: 97 E Brokaw Rd, Ste 210, San Jose, CA 95112
Agenda
- 1:30pm - 1:50pm: Reception and social time
- 1:50pm - 2:30pm: Session 1 by Bin Fan
- 2:30pm - 3:10pm: Session 2 by Shaoshan Liu
- 3:10pm - 3:30pm: Q&A and offline networking with Bin Fan, Shaoshan Liu and Haoyuan Li
Session 1: Tachyon overview
Tachyon is a memory-centric fault-tolerant distributed storage system, which enables reliable file sharing at memory-speed. It was born in UC Berkeley AMPLab. It is open source and is deployed at multiple companies. In addition, Tachyon has more than 100 contributors from over 30 institutions, including Baidu, IBM, Intel, and Yahoo etc. Earlier this year, the latest spinout from AMPLab, Tachyon Nexus, started to commercialize Tachyon. The company is funded by Andreessen Horowitz. It was also recently listed on 9 Hot Enterprise Storage Companies to Watch by Network World and Computer World. In this talk, we present an overview of Tachyon, as well as some recent development and use cases.
Session 2: Fast big data analytics with Spark on Tachyon in Baidu
In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) efficiency (up to 30x performance improvement) within Baidu. In detail, we will explain: Currently within Baidu, we have a production Tachyon cluster with 150 nodes and over 2 PB of storage space, this cluster mainly serves as the cache layer for our Big Data Analytics engine. In this talk, first we introduce the Big Data Analytic infrastructure within Baidu. Then, we explain why we started using Tachyon several months ago, as well as the problems encountered when we started using Tachyon. Next, we delve into the details of how Tachyon help accelerate our Big Data Analytics pipeline at its current state. At the end, we discuss what new features we want to see and the plan to scale further.
Speaker’ bio
- Bin Fan is a software engineer at Tachyon Nexus. He is a top committer of the Tachyon project. Prior to Tachyon Nexus, he worked in Google to build the core storage infrastructure and won Google's Technical Infrastructure award. Bin got his Ph.D. in computer science from Carnegie Mellon University.
- Shaoshan Liu is currently a Senior Architect at Baidu U.S.A. working on Big Data Infrastructure. Before Baidu, he worked at Linkedin and Microsoft. Shaoshan has a Ph.D. from UC Irvine.
- Haoyuan Li is founder and CEO of Tachyon Nexus. He is a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, where he co-created Tachyon, an open source memory-centric distributed storage system. He is also a founding committer of Apache Spark. Before Berkeley, he worked at Conviva and Google. Haoyuan has a M.S. from Cornell University and a B.S. from Peking University
主办
协办
- 南京大学硅谷校友会
- 瀚海硅谷科技园
- 硅谷清华联网
- 中国科技大学校友会创业俱乐部
- 浙江大学校友会海纳创新创业俱乐部
- 北京大学北加州校友会
- 武汉大学北加州校友会
- 东南大学硅谷校友会
- 吉林大学硅谷校友会
- 复旦大学北加州校友会
- 华人事业互助会
- 华美信息存储协会
- JayW Salon</p>
Related articles
- (Bay Area) Snowflake / Databricks / OceanBase
- (Bay Area) 云端数据中台:数据编排与平台运维
- (Bay Area) Google Doc 是如何炼成的 - 深入浅出协同编辑/Deep Dive Collaborative Editing
- (Bay Area) An introduction of Analytics Zoo and how to use it at Uber
- (Bay Area) Tensorflow.JS: Bringing Machine Learning To The Web And Beyond
- (Bay Area) Weakly Supervised Natural Language Understanding / 基于弱监督学习的自然语言理解 By Mosaix.ai
- (Bay Area) Data Extraction Revolution in Bloomberg, From Human Typing To Deep Learning Excerpting
- (Bay Area) Next-Generation AI Powered Operation System
- (Bay Area) Power Blockchain with Hardware Innovations
- (Bay Area) 区块链产业现状及技术发展(阿里巴巴技术日)
- (Bay Area) Anatomizing Blockchain through Many Views(区块链折叠)
- (Bay Area) Deep Dive of Alluxio and Google gVisor
- (Bay Area) 技术创造新商业:阿里巴巴搜索推荐&计算平台事业部硅谷开放日
- (Bay Area) Google Translate助力自然语言理解
- (Bay Area) Alibaba Tech Open Day – AI, Cloud, Infrastructure and More
- (Bay Area) 通向区块链3.0的未来之路
- (Bay Area) Alibaba New Retail / Hema Tech Day (盒马生鲜技术日)
- (Bay Area) exGoogle Leaders, leap.ai co-founders share their career stories & insights (Richard Liu, Yunkai Zhou)
- (Bay Area) Augmented Intelligence to Improve Health Care Consumer Experience
- (Bay Area) GrowingIO 湾区技术同学见面会
- (Bay Area) Alibaba Technology Forum, Stanford University
- (Bay Area) How Pinterest Perfected New User Onboarding
- (Bay Area) Tencent Tech Day - Silicon Valley
- (Bay Area) Deep dive of DeepMap (Wei Luo)
- (Bay Area) Apache Kafka: The Rise of Real-time
- (Bay Area) 苏宁机器学习平台及Buddy AI人工智能自动客服系统技术分享
- (Bay Area) JD.com Tech Day - Leverage Technology to empower business intelligence
- (Shanghai) 采用超低功耗AI技术的小MU机器人的实现与应用
- (Bay Area) Transwarp(星环科技) && DistributedLog
- (Bay Area) AI in Service robotics and Mini Robot
- (Shanghai) Google SRE如何管理数据中心
- (Bay Area) 如何用1/6000的训练数据击败深度学习——文字识别实验讨论
- (Shanghai) Twitter Heron Streaming at Scale
- (Bay Area) AI大牛谈深度学习最新进展
- (Bay Area) 新一代创新搜索技术架构讨论专场
- (Bay Area) CAINIAO Technology Forum, Silicon Valley
- (Bay Area) How to build a NewSQL database? (Qi Liu)
- (Bay Area) The Evolution of Big Data APIs in Spark (Reynold Xin)
- (Bay Area) TensorFlow: A Large-Scale Machine Learning System (Zhifeng Chen)
- (Bay Area) Ant Financial Tech Forum (2016蚂蚁金服技术湾区论坛)
- (Bay Area) Espresso: LinkedIn’s Distributed Database (Yun Sun)
- (Bay Area) Virtual Reality & Augmented Reality (Guodong Rong)
- (Bay Area) Etcd: A key-value store Open Source for Data consistency, Data persistency, Data synchronization in Distributed system (Xiang Li)
- (Bay Area) Introduction To OpenStack (Weidong Shao & Xin Wu)
- (Bay Area) A Journey of AI: from Silicon Valley to Beijing, from Big Name to Startup (Kai Yu)
- (Bay Area) CoreOS rkt, a Container Runtime (Yifan Gu)
- (Bay Area) Borg: Large-scale Cluster Management at Google (Xiao Zhang)
- (Bay Area) Spark MLlib: Past, Present and Future (Xiangrui Meng)
- (Bay Area) Cassandra: an open source distributed database (Charles Cao)
- (Bay Area) Apache Samza: a distributed stream processing framework (Yi Pan)
- (Bay Area) 大数据时代的金融服务创新 (Li Cheng)
- (Bay Area) 大数据人工智能 (Kai Yu)
- (Bay Area) Photon: Fault-tolerant and scalable joining of continuous data streams (Tianhao Qiu)
- (Bay Area) Large-scale data science and engineering with Spark (Reynold Xin)
- (Bay Area) Building a real time data platform with Apache Kafka (Jun Rao)
- (Bay Area) Kubernetes: Google’s secret weapon for Cloud computing (Dawn Chen)
- (Bay Area) Tachyon: A Reliable Memory-Centric Distributed Storage System