Lewes, DE – October 3, 2017 – John Snow Labs, a global data operations company accelerating data science projects & teams, announces the availability of its Natural Language Processing software library for Apache Spark. The provides simple, high performing & accurate NLP annotations for machine learning pipelines, which scale easily in a distributed environment.
The John Snow Labs NLP Library is built on top of Apache Spark ML, providing three advantages:
- Unmatched runtime performance, since processing is done directly on Spark DataFrames without any copying and taking full advantage of Spark’s caching, execution planning and optimized binary data format.
- Frictionless reuse of existing Spark libraries, including distributed topic modelling, word embeddings, n-gram calculation, string distance calculations and more.
- Higher productivity by using a unified API across the Natural Language Understanding, Machine Learning & Deep Learning parts of a data science pipeline.
The NLP library is written in Scala, and includes Scala and Python APIs libraries. It has no dependency on any other NLP or ML library. The code has been reviewed by Databricks’ machine learning engineers for fit to Spark ML’s current and future design. The library is released as open source under the Apache 2.0 license.
“With JSL-NLP, we’re delivering on the promise to enable customers to take advantage of the latest open source technology and academic breakthroughs in data science, all within a high performance, enterprise-grade code base,” said the founding team. In addition, “JSL-NLP encompasses a wide range of highly efficient Natural Language Understanding tools for text mining, question answering, chat bots, fact extraction, topic modelling or Search, running at a scale and performance that has not been available to date.”
John Snow Labs will continue sponsoring the development of the NLP library. The company provides commercial support, indemnification and consulting. This provides the library with long-term financial backing, a funded active development team, and a growing stream of real-world projects that drives robustness and roadmap prioritization.
Visit the Spark-NLP GitHub Repository to clone the code base or contribute to the project. Github’s issue tracker is used to manage code requests, bugs and features. The team is looking for contributors of all kinds, from general feedback to coding new algorithms.
The NLP Quickstart Guide on the project’s homepage provides full documentation on installing, using and extending NLP pipelines and annotators.
About John Snow Lab
John Snow Labs Inc. is a Data Operations company, accelerating data science projects and teams in healthcare and life science. A third of the team have a PhD or MD degree and 75% of us have at least a Master’s, coming from multiple disciplines covering data research, data engineering, data science, pharma and medicine. We are a USA Delaware Corporation, run as a global virtual team located in 17 countries around the globe. We believe in being great partners, in making our customers widely successful, and in using data philanthropy to make the world a better place.
For more information on Spark NLP text processing library, please contact:
John Snow Labs
Attn: Ida Lucente
16192 Coastal Highway
Lewes, DE 19958
+1 (302) 786-5227
Company Name: John Snow Labs
Contact Person: Ida Lucente
Email: Send Email
Phone: +1 (302) 786-5227
Address:16192 Coastal Highway
State: Delaware 19958
Country: United States