chaotic_neural: Associative clustering and analysis of papers on the ArXiv¶
This package aims at providing a model to find related papers on ArXiv given another paper (or a set of keywords).
It aims to be different from existing resources like the default ArXiv search, the new ADS, or ArXivsorter in that it uses Doc2Vec, an unsupervised algorithm that trains a shallow neural network to transform every document (in this case ArXiv abstracts) into a vector in a high-dimensional vector space. Similar papers are then found by finding the closest vectors to one of interest in this space. This also allows for performing vector arithmetic operations on keywords (i.e. adding and subtracting keywords) as well as vectors corresponding to entire documents to structure specific queries.
Users can either build their own model (by searching ArXiv with specific queries) or use the pre-trained model that has been trained on recent astro-ph.GA papers up to Sunday, May 23, 2021. A live version of the tutorials can be found [here on Google Colab](https://colab.research.google.com/drive/1pHsSm37u7lZKP2TTe1batXXXW_P-dyd9?usp=sharing).
- Basic usage of the
chaotic_neutral
package - Visualizing a trained model
- Loading the trained model:
- Generating vectors corresponding to each document in the corpus:
- Using UMAP, we can now generate an embedding of the 50-dim vector space in two dimensions:
- Let’s create a more dynamic version of the plot.
- Check different areas of the plot by quantities like publishing year, number of authors, and primary category. We expect no large correlations for any of these quantities, and this serves more as a sanity check.
- An now, we can start searching for specific phrases:
- Finally, let’s check to see if the same phenomenon (in this case, a tight observed correlation between the stellar masses and star formation rates of galaxies) called by different names are found in the same part of the UMAP embedding:
- Checking different simulations
- And different telescopes
- Generate plots corresponding to where recent (mid-2020+) research on a given topic / related to a given paper has appeared on ArXiv.
- Building a custom model
The code is designed to be intuitive to use, and and consists of three steps to get you started:
loading a pre-trained model
performing searches
training a new model
More detailed descriptions of these modules can be found in the tutorials. If you are interested in going off the beaten track and trying different things, please let me know so that I can help you run the code as you’d like!
Contribute¶
Issue Tracker: https://github.com/kartheikiyer/chaotic_neural/issues
Source Code: https://github.com/kartheikiyer/chaotic_neural
benchmark - predictions w/ normalizing flows
spectral signal as a function of parameters Support ——-
If you are having issues, please let me know at: kartheik.iyer@dunlap.utoronto.ca
License & Attribution¶
Copyright 2019 Kartheik Iyer and contributors.
chaotic_neural is being developed by Kartheik Iyer in a public GitHub repository. The source code is made available under the terms of the MIT license.
If you make use of this code, please cite the repository or the upcoming paper (Iyer et al. in prep.).