![]() I've been doing the above and it's pretty cool. PM> Add-Migration Initial This command scaffold a migration to create the initial set of tables for your model. Run the following command in Package Manager Console. Once you have a model, you can use migrations to create a database. Or pip3 install datasette Datasette and run datasette serve enron.db The optionsBuilder has UseSqlite method it expects a connection string as a parameter.It goes through all the mail, expands the addresses, converts the dates, tidies some text, makes a giant SQLite table filled with everything, then it runs some SQL scripts that turn that into a messages table, an addresses table, and adds full-text search and extrapolates a table of who-sent-to-whom. This might take a long time, half hour or an hour. ![]() be running python 3+ (no custom libs needed).pv ( brew install pv or apt-get install pv) (not a hard requirement, you can edit it out of the sh file, but it's nice to know what's happening with the 3.6 gigs of data you're dumping into SQLite).20 spare gigs or thereabouts (the DB is around 5Gb all in when done).INSERT INTO messages_ft(docid, subject, body) SELECT mid, subject, body FROM message My way Requirements Which I loaded into MySQL, dumped, and imported using mysql2sqliteĬREATE VIRTUAL TABLE messages_ft USING fts4(subject, body) To learn more, there's a really good resource at that gives you everything you need to know about the different forms of the data, how it was used, etc. Even so, the “Enron e-mail corpus,” as the cleaned-up version is now known, remains the largest public domain database of real e-mails in the world-by far. FERC eventually culled the trove to remove the most sensitive and personal data, after receiving complaints (see PDF). In the name of serving the public’s interest during its investigation of Enron, the federal agency made the controversial decision to post online more than 1.6 million e-mails that Enron executives sent and received from 2000 through 2002. I'll explain both below.Įnron was an energy trading company that did a lot of bad things and got in trouble with the Federal Energy Regulatory Commission. To get the list of all instructions, we type the. It has its own set of meta commands including. It evaluates queries interactively and displays the results in multiple formats. Of course after I was done I did my homework and found that what I needed was already out there and eight hours of work could have been 15 minutes of work. The sqlite3 tool is a terminal based frontend to the SQLite library. But mostly this was a learning exercise I wanted to get good at converting a large arbitary email-like thing into SQLite3. There is a complementary utility library called sequelize-cli that helps to automate some of the mundane as well as somewhat non-trivial parts of database programming. I always wanted to poke around this dataset. Sequelize.js is a popular ORM for Node.js version 4 and above that can be used for many different database management systems (DBMS) such as MySQL, Postgres, SQLite, and others.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |